Local, Whisper-powered dictation for any text field on macOS. Hold one key, speak, release — text appears at the cursor. No cloud, no per-app integration.
- One key, system-wide. Works in Notes, Slack, Safari address bars, terminals — anywhere you can type.
- Hold-to-talk or toggle. Both modes are first-class.
- 100% local. Audio never leaves your machine.
- Backend is swappable. whisper.cpp ships in the bundle; MLX, faster-whisper, or an HTTP server work via one config field.
- Safe insertion. Pasteboard is snapshotted and restored.
Voxate is built from source. Clone the repo, install a Whisper backend, build the menu-bar app, and grant the macOS permissions on first launch.
git clone https://github.com/Gent8/voxate.git
cd voxateOption A — whisper.cpp (recommended; fastest, no Python)
brew install whisper-cpp
mkdir -p ~/.config/voxate/models
# base = good speed/quality default; large-v3 = most accurate.
curl -L -o ~/.config/voxate/models/ggml-base.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binOption B — MLX (Apple Silicon only; slightly higher quality per MB)
pip3 install --user mlx-whisperThen in config.json:
"transcribeCommand": [
"/usr/bin/env", "python3",
"/ABSOLUTE/PATH/TO/voxate/scripts/transcribe_mlx.py",
"{audio}",
"--model", "mlx-community/whisper-base.en-mlx"
]Create a stable local signing identity once. This stops macOS from treating every rebuild as a different app for Microphone + Accessibility:
bash scripts/setup-signing.shThen build and open the app:
bash scripts/bundle-app.sh
open build/Voxate.appbundle-app.sh requires stable local signing by default. For a throwaway build that resets permissions on each rebuild:
VOXATE_SIGNING=adhoc bash scripts/bundle-app.shFirst launch will prompt for:
| Permission | Why |
|---|---|
| Microphone | Capture audio. |
| Accessibility | Listen to global keys and synthesize ⌘V. |
The app opens Setup… automatically if anything important is missing — it checks the mic, Accessibility, the configured Whisper executable, and the model path. Grant permissions, install the backend/model, then click Refresh.
Click the menu-bar icon → Settings… for the everyday options:
| Setting | What it does |
|---|---|
| Trigger behavior | Hold-to-talk or toggle recording. |
| Trigger key | fn / globe, F1, F5, F6, F7, or F8. |
| Language | Auto-detect, English, Dutch, French, German, Spanish, or a custom code. |
| Sounds | Enable/disable the start/finish cues. |
| Appearance | Branded or system recording indicator (system/light/dark). |
| Insertion | Clipboard restore, smart spacing, focus-change safety. |
For advanced backend changes, click Open Advanced Config… to edit ~/.config/voxate/config.json directly:
{
"keyCode": 63,
"modifierFlags": 0,
"triggerMode": "hold",
"transcribeCommand": [...],
"language": "auto",
"restoreClipboard": true,
"insertionPrefix": "",
"playSounds": true,
"smartSpacing": true,
"focusSafetyCheck": true,
"recordingIndicatorStyle": "branded",
"appearanceMode": "system"
}Field reference
keyCode— macOS virtual key code. Defaults to 63 (fn / globe). Common others: 49 = Space, 122 = F1, 53 = Esc, 96 = F5.modifierFlags— required modifier bitmask.0for none.0x40000= fn,0x20000= shift,0x40000= control,0x80000= option,0x100000= command (combine with|).triggerMode—"hold"(push-to-talk) or"toggle"(press to start, press to stop).language—"auto","en","nl","fr", … Appended to the backend as--language <lang>unless"auto".restoreClipboard— keep your clipboard intact across dictations.insertionPrefix— string to prepend to each insertion. Set to" "if words run into the previous one.playSounds— enable/disable start, stop, and completion sounds.recordingIndicatorStyle—"branded"by default, or"system"for a quieter native overlay.appearanceMode—"system","light", or"dark"for the recording indicator.
After editing, click the menu-bar icon → Reload config.
Note
Language selection depends on the model. .en whisper.cpp models are English-only — use a multilingual model like ggml-base.bin or ggml-small.bin for other languages or auto-detect. If you prefer ggml-base.en.bin, set Language to English.
- Place the cursor wherever you want text.
- Hold the trigger key (fn / globe by default).
- Speak.
- Release — text appears at the cursor.
In toggle mode: press once to start, press again to stop.
┌──────────────────────────────────────────────────────────────────┐
│ Swift menu-bar agent (Sources/Voxate) │
│ ───────────────────────────────────────────────────────────── │
│ HotkeyManager ── CGEventTap → press/release of trigger key │
│ AudioRecorder ── AVAudioEngine → 16 kHz mono WAV (temp file) │
│ WhisperEngine ── subprocess → user-configurable transcribe CLI│
│ TextInserter ── pasteboard + synthesized ⌘V │
│ AppDelegate ── status item, state machine, config hot-reload│
└──────────────────────────────────────────────────────────────────┘
▲ ▲
│ │
config.json whisper-cli (whisper.cpp)
or transcribe_mlx.py (MLX)
The Whisper backend is a subprocess, not a linked library. That lets you swap engines (whisper.cpp, mlx-whisper, faster-whisper, an HTTP server, …) by editing one array in config.json — no recompile.
| Choice | Why we picked it |
|---|---|
| Swift menu-bar app vs. pure Python | We need reliable global hotkeys, microphone capture, and synthesized keystrokes. The Cocoa APIs make this clean; Python via pyobjc is fragile around Accessibility/CGEventTap. |
CGEventTap vs. Carbon RegisterEventHotKey |
We need both press and release for hold-to-talk, and we need to listen to modifier-only keys (fn/globe = keyCode 63). Carbon hotkeys can't do either. |
| Pasteboard + ⌘V vs. AXUIElement insertion | Direct AX writes break in Electron, web views, and some Cocoa controls. Paste works everywhere a user can type. We snapshot/restore the clipboard. |
| Subprocess Whisper vs. embedded library | Lets us defer the model/runtime choice to the user, ship no native ML deps, and keep the app bundle simple to sign. |
Inspired by StageWhisper (menu-bar shape and cursor-insertion intent) and whisper-shortcut (lightweight shortcut→Whisper plumbing), but neither is imported directly — the hotkey layer here uses a CGEventTap so we get genuine key-up events for hold-to-talk, including for the fn/globe key.
Package.swift SPM manifest (macOS 13+, single executable target)
Sources/Voxate/
main.swift App entry, sets .accessory activation policy
AppDelegate.swift Status item, state machine, config hot-reload
Config.swift Codable JSON config + ~/.config bootstrap
HotkeyManager.swift CGEventTap → key press/release callbacks
AudioRecorder.swift AVAudioEngine → 16 kHz mono int16 WAV
WhisperEngine.swift Subprocess transcription, stdout or .out.txt
TextInserter.swift Pasteboard snapshot + ⌘V + restore
scripts/
bundle-app.sh Builds release binary, wraps in .app w/ Info.plist
package-release.sh Creates local DMG/zip artifacts + SHA-256 sums
transcribe_mlx.py Optional MLX backend
Resources/
config.example.json Drop-in starter config
- Latency = backend latency. With
ggml-base.enon M-series silicon, a 3-second utterance transcribes in ~0.4–0.8s. Withlarge-v3expect a few seconds. There's no streaming yet — Whisper transcribes only after release. - Paste-based insertion means a brief pasteboard touch. We snapshot/restore, but a clipboard manager logging every change will see one entry.
- fn / globe key is special on Apple silicon: macOS sometimes reserves it for the emoji picker or input switching. If your fn key fires the system picker first, switch to a function key (e.g.,
keyCode: 122for F1) in config. - Signing and sandboxing. Local development builds use a self-signed identity in
~/.config/voxate/signing.keychain-dbso macOS can keep Accessibility + Microphone grants stable across rebuilds. The app is still unsandboxed. Public distribution needs Developer ID signing, notarization, and explicit entitlements. - No streaming partial results. Easy future addition: switch the backend to
whisper-streamand pipe partials intoTextInserter.
HUD does not appear when pressing the trigger key
Usually the global hotkey path isn't active yet. Open Setup… from the menu-bar app and check Accessibility first. If System Settings shows the app enabled but the trigger still does nothing, reset the stale grant and reopen the stable-signed local build:
tccutil reset Accessibility dev.local.voxate
open build/Voxate.appThen enable Voxate again in System Settings → Privacy & Security → Accessibility. This often happens after switching from ad-hoc to the local stable signer.
For deeper hotkey debugging, quit the app and launch it directly:
VOXATE_DEBUG=1 build/Voxate.app/Contents/MacOS/VoxateThe debug logs include Accessibility trust, configured key code, event-tap startup, and raw key events received by the tap.
bash scripts/package-release.shArtifacts land in dist/:
Voxate-0.1.0.dmgVoxate-0.1.0.zipSHA256SUMS
These are for local / open-source testing. Public macOS distribution still needs Developer ID signing and notarization.
What works today:
- One configurable global key, system-wide, in any text-input app.
- Hold-to-talk and toggle-to-talk modes.
- Text arrives at the cursor with no app-specific integration.
- Clipboard is preserved by default.
- Backend is swappable; multiple languages supported via the
languagefield.
What's missing vs. macOS built-in dictation:
- No live partial transcription while you speak.
- No on-the-fly punctuation commands ("new line", "comma").
- No per-app text formatting heuristics (capitalize-after-period is whatever Whisper produces).
The first two are tractable follow-ups against the same architecture — both live in WhisperEngine + TextInserter and need no changes elsewhere.
Issues and PRs are welcome. A few ground rules to keep things smooth:
- Bug reports — include macOS version, Apple silicon vs Intel, the Whisper backend you're using, and the relevant lines from
VOXATE_DEBUG=1 build/Voxate.app/Contents/MacOS/Voxate. - PRs — keep them small and focused; one logical change per PR. Run a clean build (
bash scripts/bundle-app.sh) before opening. - Scope — Voxate is intentionally narrow: hold-to-talk → paste-at-cursor. Features outside that loop (LLM rewriting, command modes, per-app heuristics) are unlikely to land in core.
If you find a vulnerability, please follow the disclosure process in SECURITY.md rather than opening a public issue.
Released under the GPL-3.0 License.