SubForge is a Rust CLI for transcribing, segmenting, translating, evaluating, and muxing or burning subtitles into videos. It is built for people who process videos repeatedly and do not want every project to become a pile of scripts, temporary files, model paths, ffmpeg flags, and manual rework.
video / audio
-> speech recognition
-> subtitle segmentation
-> translation
-> quality estimation
-> hard-burned video / soft subtitle track
Most subtitle workflows are not one tool. They are a chain:
- extract or transcribe audio
- split text into readable subtitle cues
- translate with enough context to keep terms stable
- check low-quality translations
- render subtitles into a video or mux them as a subtitle track
- keep intermediate outputs, caches, models, and project memory organized
SubForge makes that chain explicit, repeatable, and inspectable.
It is not a GUI editor. It is a CLI-first tool for local automation, batch processing, video localization, course translation, and creator workflows where reproducibility matters.
- Rust CLI with Linux, macOS, and Windows CI
- Local
faster-whispertranscription with CPU or CUDA support - SaT-based subtitle segmentation via an embedded Python sidecar
- Google, Bing, and OpenAI-compatible LLM translation backends
- Two LLM translation modes:
- chained translation for best context continuity
- wave-based concurrent translation for long videos
- MAPS-style terminology extraction and project-level translation memory
- GEMBA-MQM quality estimation with targeted low-score refinement
- Hard subtitle burning and soft subtitle muxing through ffmpeg
- GPU selection for faster-whisper and NVENC encoders
- Model download, cache management, environment diagnostics, and config tools
- Secret scanning in CI
SubForge is usable, but still early. The current release is 0.2.0.
The core CLI, ffmpeg integration, cache handling, model management, and configuration flow are already in place. The next layer of work is better release automation, broader real-world end-to-end testing, and more polished documentation for recommended translation settings.
SubForge currently builds from source. Prebuilt binaries are not published yet.
| Tool | Purpose |
|---|---|
| Rust 1.88+ | Build the subforge binary |
| Python 3.9+ | Run faster-whisper, SaT, SubER, and related sidecars |
| ffmpeg | Extract audio, burn subtitles, and mux subtitle tracks |
Install Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal # Linux / macOSWindows PowerShell:
winget install Rustlang.Rustup
$env:RUSTUP_DIST_SERVER="https://mirrors.tuna.tsinghua.edu.cn/rustup"
$env:RUSTUP_UPDATE_ROOT="https://mirrors.tuna.tsinghua.edu.cn/rustup/rustup"
rustup set profile minimal
rustup toolchain install stable
rustup default stable
cargo --version
mkdir $env:USERPROFILE\.cargo -Force
@"
[source.crates-io]
replace-with = "rsproxy"
[source.rsproxy]
registry = "sparse+https://rsproxy.cn/index/"
"@ | Set-Content -Encoding UTF8 $env:USERPROFILE\.cargo\config.tomlInstall ffmpeg:
sudo apt install ffmpeg # Debian / Ubuntu
sudo dnf install ffmpeg # Fedora
sudo pacman -S ffmpeg # Arch
brew install ffmpeg # macOS
winget install Gyan.FFmpeg # WindowsInstall SubForge:
git clone https://github.com/deusjin/subforge.git
cd subforge
cargo install --path .
subforge --versionInstall Python-side dependencies:
subforge setup # CPU PyTorch
subforge setup --compute cu124 # CUDA 12.x
subforge setup --compute cu128-nightly # RTX 50 / Blackwell sm_120
subforge setup --force # rebuild the local venvCreate local config:
cp config.toml.example config.toml # Linux / macOS
copy config.toml.example config.toml # Windows cmdThen verify the environment:
subforge doctorconfig.toml is ignored by Git because it may contain API keys.
Process a video end to end:
subforge process video.mp4Generate translated subtitles without burning them into the video:
subforge translate video.mp4Use soft subtitle muxing instead of re-encoding the video:
subforge process video.mp4 --synth-mode softGenerate both a hard-burned video and a soft-subtitle output:
subforge process video.mp4 --synth-mode bothRun each stage manually:
subforge transcribe video.mp4 --asr faster-whisper
subforge subtitle video.srt --translator google
subforge synthesize video.mp4 --subtitle video_translated.srt| Command | Purpose |
|---|---|
transcribe |
Audio/video to SRT |
subtitle |
SRT to translated SRT |
translate |
Transcribe and translate, without video synthesis |
synthesize |
Video + SRT to hard-burned or muxed output |
process |
Full pipeline: transcribe, translate, synthesize |
eval |
Subtitle quality evaluation with SubER and text metrics |
setup |
Create Python venv and install sidecar dependencies |
doctor |
Check ffmpeg, Python packages, CUDA, config, and sidecar sync |
model |
List and download faster-whisper models |
gpu |
Detect and select the default CUDA GPU |
cache |
Show, prune, or clean cache entries |
config |
Show, get, set, and locate configuration |
| Backend | Mode | Notes |
|---|---|---|
google |
Web translation | Free, no key required, simple concurrent per-cue path |
bing |
Web translation | Free, requires curl, uses refreshable auth token handling |
llm |
OpenAI-compatible API | Best quality path, supports terminology, memory, QE, and refine |
| empty string | No translation | Useful for reformatting or synthesis only |
Example:
subforge subtitle input.srt --translator llm --target-language zh-HansFor LLM translation, configure:
translator = "llm"
api_key = ""
base_url = "https://api.openai.com/v1"
model = "gpt-4o-mini"
target_language = "zh-Hans"Environment variables take precedence:
export OPENAI_API_KEY="..."
export OPENAI_BASE_URL="https://api.openai.com/v1"The LLM backend uses a staged pipeline:
source SRT
|
+-- terminology extraction
| -> glossary.jsonl
|
+-- batched translation
| -> chained mode or concurrent wave mode
|
+-- GEMBA-MQM quality estimation
|
+-- targeted refinement for low-score cues
|
+-- translation memory
-> memory.jsonl
chained_translation = true gives the best continuity because every batch sees
the previous translations before it runs. It is effectively sequential for the
main LLM translation stage.
chained_translation = false warms up with three sequential batches and then
runs later batches in concurrent waves. It is faster on long videos, while
preserving cross-wave context.
chained_translation = true # best continuity
thread_num = 3
batch_size = 7For faster long-video runs:
chained_translation = false
thread_num = 5
batch_size = 10batch_size controls how many subtitle cues go into one LLM request.
thread_num controls how many requests can run at the same time.
| Backend | Description | Configuration |
|---|---|---|
faster-whisper |
Local Whisper transcription, recommended | whisper_model, whisper_device |
bijian |
Bilibili Bcut endpoint | bijian_base_url |
whisper-api |
OpenAI-compatible audio transcription | whisper_api_model, asr_api_key, asr_base_url |
whisper-cpp |
Local whisper.cpp binary | whisper_cpp_model |
First-time local model usage is interactive. SubForge lists available faster-whisper models and downloads the selected one with Hugging Face progress. For non-interactive environments, download in advance:
subforge model list
subforge model download turboHard burn subtitles:
subforge synthesize video.mp4 --subtitle sub.srt --mode hardSoft-mux subtitles without re-encoding:
subforge synthesize video.mp4 --subtitle sub.srt --mode softCreate both outputs:
subforge synthesize video.mp4 --subtitle sub.srt --mode bothStyle and encoder options:
subforge synthesize video.mp4 --subtitle sub.srt \
--font "Source Han Sans" \
--font-size 24 \
--font-color FFFFFF \
--outline-color 000000 \
--outline-width 3 \
--position bottom-right \
--encoder x265 \
--crf 22 \
--preset slowSoft subtitle container behavior:
| Output container | Subtitle codec |
|---|---|
.mp4, .m4v, .mov |
mov_text |
.mkv |
SRT |
.webm |
WebVTT, only when the source video is already WebM-compatible |
If in doubt, use .mkv for soft subtitles or use hard burn.
SubForge can pin faster-whisper and NVENC to a selected CUDA GPU:
subforge gpu
subforge gpu --set 1
subforge config get cuda_gpu
subforge config set cuda_gpu ""At runtime, SubForge reports the physical GPU selected through
CUDA_VISIBLE_DEVICES.
Config lookup order:
--config <path>$SUBFORGE_CONFIG./config.toml$XDG_CONFIG_HOME/subforge/config.toml$HOME/.config/subforge/config.toml
Common options:
| Key | Default | Description |
|---|---|---|
asr |
faster-whisper |
Speech recognition backend |
whisper_model |
small.en |
faster-whisper model |
segmenter |
sat |
Subtitle segmentation algorithm |
translator |
google |
Translation backend |
target_language |
zh-Hans |
Target language |
layout |
target-above |
Subtitle layout |
thread_num |
3 |
Concurrent request count |
batch_size |
7 |
LLM subtitle cues per request |
quality_estimation |
true |
Enable GEMBA-MQM scoring |
refine |
true |
Retry low-score translations |
synth_mode |
empty | hard, soft, or both; empty means hard |
synth_encoder |
empty | x264, x265, nvenc, nvenc-hevc, qsv, videotoolbox |
Use:
subforge config show
subforge config get api_key
subforge config set whisper_model medium
subforge config pathsubforge config get api_key only prints a redacted prefix.
By default, project memory is stored beside the source video:
.subforge-tm/
glossary.jsonl
memory.jsonl
.lock
For shared memory across multiple videos, set:
subforge config set tm_dir /shared/path/.subforge-tmThe memory writer uses an advisory lock so concurrent processes do not corrupt the JSONL files.
Runtime data lives under .subforge/ by default:
.subforge/
cache/
models/faster-whisper/
tools/faster-whisper-cli/venv/
Useful commands:
subforge cache stats
subforge cache prune --days 30 --max-mb 500
subforge cache cleanEvaluate subtitle quality against a reference SRT:
subforge eval output.srt -r reference.srt
subforge eval output.srt -r reference.srt -l zhThe eval path uses SubER plus text metrics such as WER, BLEU, chrF, and TER.
cargo fmt --all -- --check
cargo clippy --all-targets -- -A clippy::field_reassign_with_default
cargo build --all-targets
cargo test --all-targets
cargo bench --bench client_reuseThe Python sidecar at scripts/transcribe_segment.py is embedded into the Rust
binary with include_str!. Rebuild the binary after editing it. subforge doctor
checks whether the embedded copy and the file on disk are in sync.
config.tomlis ignored by Git and may contain API keys..subforge/,.subforge-tm/,target/, and test videos are ignored.- CI runs gitleaks against common API keys, bearer tokens, and JWTs.
- Runtime retry errors redact secret-looking URL query parameters and bearer tokens before printing them.
Do not paste config.toml, subforge config show, or API gateway URLs with
tokens into public issues.
| Symptom | Fix |
|---|---|
cargo not found |
Install Rust with rustup, then restart the terminal |
ffmpeg not found |
Install ffmpeg with your system package manager |
| Python package missing | Run subforge setup |
| CUDA unavailable | Rebuild the venv with subforge setup --compute cu124 --force |
| RTX 50 / Blackwell unsupported | Use subforge setup --compute cu128-nightly --force |
subforge eval cannot find SubER |
Run subforge setup or install subtitle-edit-rate |
| Bing translation fails because curl is missing | Install curl |
| Edited Python sidecar has no effect | Rebuild with cargo install --path . |
For a full local check:
subforge doctorSubForge is engineering-oriented, but several stages are inspired by published subtitle and translation quality work:
- Segment Any Text, EMNLP 2024 - neural segmentation
- Long-Form Speech Translation, Findings of EMNLP 2023 - context-aware subtitle work
- GEMBA-MQM, WMT 2023 - LLM-based quality estimation
- MAPS, WMT 2024 - terminology extraction and consistency
- SubER, IWSLT 2022 - subtitle evaluation
This project is promoted in the LINUX DO open-source community. Thanks to the community for discussion, feedback, and suggestions.
MIT

