-
Notifications
You must be signed in to change notification settings - Fork 0
Self Hosting Whisper
For fully on-device transcription with no network calls, ReWrite can spawn a whisper.cpp whisper-server binary that you supply. The plugin only reads the absolute paths you configure; it never downloads binaries, never looks them up on PATH, and never spawns anything you did not explicitly point it at. Desktop only.
Pair this with a local LLM for a setup where neither audio nor text leaves your machine.
When you click Start in settings, the plugin launches whisper-server as a child process and talks to it over loopback (http://127.0.0.1:<port>/inference). Its stdout/stderr are captured in a ring-buffered log you can view in settings. When you click Stop, or when the plugin unloads, the process is terminated. The plugin records a small PID sidecar file so that if Obsidian restarts while the server is still running, it can re-adopt that exact process instead of orphaning or double-spawning it. It will never kill a process it did not start.
Loopback only. whisper-server has no authentication and no TLS, so anyone who can reach its port can submit audio and exercise its native audio-decoding code. ReWrite always passes --host 127.0.0.1 when you do not specify a host, and refuses to start if you put a non-loopback --host (such as 0.0.0.0 or a LAN IP) in Extra args. The exact refusal is:
Refusing to start: --host would bind whisper-server to a non-loopback interface, exposing an unauthenticated transcription server to your network. Remove it from Extra args; ReWrite always binds 127.0.0.1.
Loopback values it accepts: 127.0.0.1, localhost, ::1, [::1]. If you run whisper-server yourself from a terminal, bind it to 127.0.0.1 the same way; do not expose it to your network unless you have put your own authenticating proxy in front of it.
-
Windows: download the latest
whisper-bin-x64.zip(CPU) orwhisper-cublas-*.zip(NVIDIA GPU) from the whisper.cpp releases page, unzip somewhere stable (for exampleC:\Tools\whisper.cpp\), and use the path towhisper-server.exe. -
macOS:
brew install whisper-cppinstalls awhisper-serverbinary;which whisper-servershows its absolute path. Or build from source as on Linux. - Linux: there are no official Linux binaries, so build from source once (see below).
-
Upstream GGML models from Hugging Face, for example
ggml-base.en.bin,ggml-small.bin,ggml-large-v3.bin. Larger is more accurate and slower. -
FUTO whisper-acft models (see below): quantized, finetuned variants that support a dynamic audio context for lower latency. They load with the same
-mflag.
In ReWrite settings, scroll to Local whisper.cpp server (desktop) and fill in:
-
Binary path: absolute path to
whisper-server(orwhisper-server.exe). The Auto-detect button checks common install locations (~/.local/bin,~/.local/share/whisper.cpp/build/bin,/usr/local/bin,/opt/homebrew/bin,/usr/bin) and fills the field if it finds one. -
Model path: absolute path to the
.binmodel file. - Port: defaults to 8080.
-
Extra args (optional): space-separated CLI args appended after
-mand--port. Split on whitespace only (a single value containing spaces, such as a quoted path, is not supported). Do not add a non-loopback--hosthere.
Click Start. The status indicator moves Stopped, Starting, Running. View log shows whisper-server's output if startup fails. You can also start/stop from the command palette and the desktop status-bar item.
Set the profile's Transcription provider to "Local whisper.cpp (desktop only)". The Transcription model field is decorative for this provider; whisper-server uses whichever model file is loaded at startup. No API key is needed. The plugin transcodes recordings to 16 kHz mono WAV before sending them to /inference.
whisper.cpp does not publish prebuilt Linux binaries, so compile it once. There is a helper script in the repo at scripts/build-whisper-linux.sh, or do it by hand:
- Install the toolchain for your distro:
- Debian / Ubuntu / Mint:
sudo apt update && sudo apt install -y build-essential cmake git - Fedora / RHEL:
sudo dnf install -y gcc-c++ make cmake git - Arch / Manjaro:
sudo pacman -S --needed base-devel cmake git - openSUSE:
sudo zypper install -y gcc-c++ make cmake git
- Debian / Ubuntu / Mint:
- Clone and build:
The default build includes the
git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp cmake -B build -DCMAKE_BUILD_TYPE=Release cmake --build build -j --config Releaseserverexample. For CUDA, add-DGGML_CUDA=ONto the firstcmakeline (requires the CUDA toolkit; longer build). - The binary lands at
<clone>/build/bin/whisper-server. Copy or symlink it somewhere stable (for example~/.local/bin/whisper-server) and ensure it is executable (chmod +x). - Sanity-check it once from a terminal:
./build/bin/whisper-server -m /path/to/model.bin --host 127.0.0.1 --port 8080. You should seewhisper server listening at http://127.0.0.1:8080. Ctrl-C to stop, then let the plugin manage it.
If cmake --build fails with 'std::filesystem' has not been declared or similar C++17 errors, your GCC is too old. Install a newer one (sudo apt install g++-12) and rerun the cmake -B build ... step with -DCMAKE_CXX_COMPILER=g++-12.
whisper-acft is a set of Whisper checkpoints finetuned by FUTO so whisper.cpp's encoder tolerates a dynamic audio_ctx (the number of audio frames it processes). Lowering the audio context on a stock model makes it unstable; the ACFT models were retrained to handle it, cutting latency on short utterances (often a 2x to 4x speedup on small models).
The checkpoints are quantized to q8_0 in the same GGML container the -m flag accepts, so no special build is needed, only a whisper.cpp recent enough to recognize the -ac / --audio-context flag.
-
Download a
.bin(English-only is smaller and faster for English; multilingual handles other languages):- English-only:
tiny_en_acft_q8_0.bin,base_en_acft_q8_0.bin,small_en_acft_q8_0.bin - Multilingual:
tiny_acft_q8_0.bin,base_acft_q8_0.bin,small_acft_q8_0.bin
These are published under
https://voiceinput.futo.org/VoiceInput/. Verify the download finished cleanly; a truncated.binfails to load with a cryptic log error. - English-only:
-
Set Model path to the FUTO
.bin. Binary path and Port are unchanged. -
Set Extra args to
-ac 768(a sensible default for short to medium clips).-accaps the encoder context: lower runs faster but only stays accurate on ACFT models.-
-ac 512for very short memos (under ~10 s). -
-ac 1500disables the speedup (the default for 30 s of audio); use it if you dictate longer than ~20 s and the tail gets cut. - Combine flags on one line, for example
-ac 768 -t 4to also cap CPU threads.
-
-
Click Start (or Restart). The log should show the ACFT model loading. The transcription provider needs no changes.
If transcripts get truncated or jumbled, -ac is too low for your clip length; raise it toward 1500 until stable.
- Port already in use: another process is bound to the port. Change the port or stop the other process. The plugin will not kill processes it did not start.
- "Port N is bound by an external whisper-server": a whisper-server the plugin did not start holds the port. Stop it via your OS tools first.
- "This whisper-server was not started by ReWrite": you tried to Stop an externally-started server. Stop it from your task manager.
-
Antivirus quarantine on Windows: Defender or third-party AV may flag
whisper-server.exeon first run. Whitelist the binary; the plugin cannot work around AV. -
Permission denied (macOS / Linux): make the binary executable (
chmod +x whisper-server). - "did not become ready within 5s": the model failed to load (wrong path, corrupted file, RAM exhausted). The log tail shows whisper.cpp's error.
-
unknown argument: -ac: your whisper-server predates the dynamic audio-context flag. Update whisper.cpp, or remove-ac(FUTO models still load, you just lose the speedup). -
FUTO model loads but transcripts are truncated:
-acis too low for your audio length; raise it (768to1024to1500).
See also the general Troubleshooting page.
These pages are generated from the wiki/ folder in the ReWrite-Voice-Notes repo. Edits made directly in the wiki are overwritten on the next sync. To fix or improve a page, edit the matching file in wiki/ and open a pull request.
Getting started
Reference
Self-hosting
Help