Self Hosting Whisper

Self-hosting: whisper.cpp

For fully on-device transcription with no network calls, ReWrite can spawn a whisper.cpp whisper-server binary that you supply. The plugin only reads the absolute paths you configure; it never downloads binaries, never looks them up on PATH, and never spawns anything you did not explicitly point it at. Desktop only.

Pair this with a local LLM for a setup where neither audio nor text leaves your machine.

How it works and the security model

When you click Start in settings, the plugin launches whisper-server as a child process and talks to it over loopback (http://127.0.0.1:<port>/inference). Its stdout/stderr are captured in a ring-buffered log you can view in settings. When you click Stop, or when the plugin unloads, the process is terminated. The plugin records a small PID sidecar file so that if Obsidian restarts while the server is still running, it can re-adopt that exact process instead of orphaning or double-spawning it. It will never kill a process it did not start.

Loopback only. whisper-server has no authentication and no TLS, so anyone who can reach its port can submit audio and exercise its native audio-decoding code. ReWrite always passes --host 127.0.0.1 when you do not specify a host, and refuses to start if you put a non-loopback --host (such as 0.0.0.0 or a LAN IP) in Extra args. The exact refusal is:

Refusing to start: --host would bind whisper-server to a non-loopback interface, exposing an unauthenticated transcription server to your network. Remove it from Extra args; ReWrite always binds 127.0.0.1.

Loopback values it accepts: 127.0.0.1, localhost, ::1, [::1]. If you run whisper-server yourself from a terminal, bind it to 127.0.0.1 the same way; do not expose it to your network unless you have put your own authenticating proxy in front of it.

Setup

1. Get a `whisper-server` binary

Windows: download the latest whisper-bin-x64.zip (CPU) or whisper-cublas-*.zip (NVIDIA GPU) from the whisper.cpp releases page, unzip somewhere stable (for example C:\Tools\whisper.cpp\), and use the path to whisper-server.exe.
macOS: brew install whisper-cpp installs a whisper-server binary; which whisper-server shows its absolute path. Or build from source as on Linux.
Linux: there are no official Linux binaries, so build from source once (see below).

2. Download a GGML model

Upstream GGML models from Hugging Face, for example ggml-base.en.bin, ggml-small.bin, ggml-large-v3.bin. Larger is more accurate and slower.
FUTO whisper-acft models (see below): quantized, finetuned variants that support a dynamic audio context for lower latency. They load with the same -m flag.

3. Configure the plugin

In ReWrite settings, scroll to Local whisper.cpp server (desktop) and fill in:

Binary path: absolute path to whisper-server (or whisper-server.exe). The Auto-detect button checks common install locations (~/.local/bin, ~/.local/share/whisper.cpp/build/bin, /usr/local/bin, /opt/homebrew/bin, /usr/bin) and fills the field if it finds one.
Model path: absolute path to the .bin model file.
Port: defaults to 8080.
Extra args (optional): space-separated CLI args appended after -m and --port. Split on whitespace only (a single value containing spaces, such as a quoted path, is not supported). Do not add a non-loopback --host here.

4. Start it

Click Start. The status indicator moves Stopped, Starting, Running. View log shows whisper-server's output if startup fails. You can also start/stop from the command palette and the desktop status-bar item.

5. Use it from a profile

Set the profile's Transcription provider to "Local whisper.cpp (desktop only)". The Transcription model field is decorative for this provider; whisper-server uses whichever model file is loaded at startup. No API key is needed. The plugin transcodes recordings to 16 kHz mono WAV before sending them to /inference.

Building whisper-server on Linux

whisper.cpp does not publish prebuilt Linux binaries, so compile it once. There is a helper script in the repo at scripts/build-whisper-linux.sh, or do it by hand:

Install the toolchain for your distro:
- Debian / Ubuntu / Mint: sudo apt update && sudo apt install -y build-essential cmake git
- Fedora / RHEL: sudo dnf install -y gcc-c++ make cmake git
- Arch / Manjaro: sudo pacman -S --needed base-devel cmake git
- openSUSE: sudo zypper install -y gcc-c++ make cmake git

Clone and build:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --config Release

The default build includes the server example. For CUDA, add -DGGML_CUDA=ON to the first cmake line (requires the CUDA toolkit; longer build).

The binary lands at <clone>/build/bin/whisper-server. Copy or symlink it somewhere stable (for example ~/.local/bin/whisper-server) and ensure it is executable (chmod +x).
Sanity-check it once from a terminal: ./build/bin/whisper-server -m /path/to/model.bin --host 127.0.0.1 --port 8080. You should see whisper server listening at http://127.0.0.1:8080. Ctrl-C to stop, then let the plugin manage it.

If cmake --build fails with 'std::filesystem' has not been declared or similar C++17 errors, your GCC is too old. Install a newer one (sudo apt install g++-12) and rerun the cmake -B build ... step with -DCMAKE_CXX_COMPILER=g++-12.

FUTO whisper-acft models (faster short clips)

whisper-acft is a set of Whisper checkpoints finetuned by FUTO so whisper.cpp's encoder tolerates a dynamic audio_ctx (the number of audio frames it processes). Lowering the audio context on a stock model makes it unstable; the ACFT models were retrained to handle it, cutting latency on short utterances (often a 2x to 4x speedup on small models).

The checkpoints are quantized to q8_0 in the same GGML container the -m flag accepts, so no special build is needed, only a whisper.cpp recent enough to recognize the -ac / --audio-context flag.

Download a .bin (English-only is smaller and faster for English; multilingual handles other languages):
- English-only: tiny_en_acft_q8_0.bin, base_en_acft_q8_0.bin, small_en_acft_q8_0.bin
- Multilingual: tiny_acft_q8_0.bin, base_acft_q8_0.bin, small_acft_q8_0.bin
These are published under https://voiceinput.futo.org/VoiceInput/. Verify the download finished cleanly; a truncated .bin fails to load with a cryptic log error.
Set Model path to the FUTO .bin. Binary path and Port are unchanged.
Set Extra args to -ac 768 (a sensible default for short to medium clips). -ac caps the encoder context: lower runs faster but only stays accurate on ACFT models.
- -ac 512 for very short memos (under ~10 s).
- -ac 1500 disables the speedup (the default for 30 s of audio); use it if you dictate longer than ~20 s and the tail gets cut.
- Combine flags on one line, for example -ac 768 -t 4 to also cap CPU threads.
Click Start (or Restart). The log should show the ACFT model loading. The transcription provider needs no changes.

If transcripts get truncated or jumbled, -ac is too low for your clip length; raise it toward 1500 until stable.

Troubleshooting

Port already in use: another process is bound to the port. Change the port or stop the other process. The plugin will not kill processes it did not start.
"Port N is bound by an external whisper-server": a whisper-server the plugin did not start holds the port. Stop it via your OS tools first.
"This whisper-server was not started by ReWrite": you tried to Stop an externally-started server. Stop it from your task manager.
Antivirus quarantine on Windows: Defender or third-party AV may flag whisper-server.exe on first run. Whitelist the binary; the plugin cannot work around AV.
Permission denied (macOS / Linux): make the binary executable (chmod +x whisper-server).
"did not become ready within 5s": the model failed to load (wrong path, corrupted file, RAM exhausted). The log tail shows whisper.cpp's error.
unknown argument: -ac: your whisper-server predates the dynamic audio-context flag. Update whisper.cpp, or remove -ac (FUTO models still load, you just lose the speedup).
FUTO model loads but transcripts are truncated: -ac is too low for your audio length; raise it (768 to 1024 to 1500).

See also the general Troubleshooting page.

Back to Home

These pages are generated from the wiki/ folder in the ReWrite-Voice-Notes repo. Edits made directly in the wiki are overwritten on the next sync. To fix or improve a page, edit the matching file in wiki/ and open a pull request.

ReWrite (Voice Notes)

Home

Getting started

Quick start

Reference

Self-hosting

Help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self Hosting Whisper

Self-hosting: whisper.cpp

How it works and the security model

Setup

1. Get a `whisper-server` binary

2. Download a GGML model

3. Configure the plugin

4. Start it

5. Use it from a profile

Building whisper-server on Linux

FUTO whisper-acft models (faster short clips)

Troubleshooting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ReWrite (Voice Notes)

Clone this wiki locally

Self Hosting Whisper

Self-hosting: whisper.cpp

How it works and the security model

Setup

1. Get a whisper-server binary

2. Download a GGML model

3. Configure the plugin

4. Start it

5. Use it from a profile

Building whisper-server on Linux

FUTO whisper-acft models (faster short clips)

Troubleshooting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ReWrite (Voice Notes)

Clone this wiki locally

1. Get a `whisper-server` binary