Releases: AlexFlanker/whisper-input-next-mac-kit
v0.2.1 — freeze resilience (hold-to-restart + diagnostics)
Occasional hangs? Now there's an escape hatch and a black box.
Added
- 🆘 Hold-to-restart — if the service ever freezes, press & hold the on-screen indicator ~1.8s: a red ring fills, then it force-restarts (launchd brings it back in ~1s). It also snapshots all thread stacks right before exiting, so the freeze is captured even if you restart immediately.
- 🔬 Freeze watchdog — when something stays stuck in a busy state too long (transcription 45s / recording 180s, configurable), it dumps every thread's stack to
logs/freeze_diagnostics.logfor diagnosis. Diagnostics-only — never auto-restarts. - ⏱️ Real
whisper-clitimeout (WHISPER_SUBPROCESS_TIMEOUT, default 90s) — kills a hung child instead of leaking it for 3 minutes behind the old thread-based wrapper.
Fixed
- The on-screen indicator now receives clicks. The app's
status_bar.pyranAppHelper.runConsoleEventLoop(), which pumps the run loop (animations/callAfter/display all work) but never dispatchesNSAppmouse events to windows — so the hold-to-restart overlay got nothing. Switched toAppHelper.runEventLoop()(activation policy unchanged). Diagnosed empirically: under the console loop, clicks fired neitheracceptsFirstMousenormouseDownunder both Prohibited and Accessory;runEventLoopfixed it.
How it ships
Two kit-owned files (src/utils/freeze_watchdog.py, src/ui/hold_restart.py, MIT) copied in by install.sh; the patcher wires 6 thin hooks into main.py / status_bar.py / local_whisper.py. Verified byte-identical to a known-good install (30 edits + 5 shipped files) and idempotent.
macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.
v0.2.0 — local LLM polish (your dictation, cleaned up on-device)
The first release on the core-quality axis: v0.1.x made dictation pleasant to use and manage; v0.2.0 makes it read better — fully on-device.
Added
- 🧠 Local LLM polish — after transcription, an optional local Ollama model tidies the text before it's pasted:
light— faithful: fixes punctuation, typos, obvious stumbles, keeps your wording.concise— trims filler / repetition / 口头禅 for a much shorter result.- Default model
glm4(GLM-4-9B); 100% offline; opt-in (POLISH_ENABLED, off by default); a hard timeout falls back to the raw transcript, so a slow or absent Ollama never blocks dictation.
- 🤖 MCP
set_polish(mode/model/enabled) — flip light↔concise, swap models, or toggle it, right from Claude Desktop.POLISH_*added to the writable allowlist; polish state shows instatus().
Enable it
brew install ollama&&ollama pull glm4(GLM-4-9B, ~5.5 GB; orqwen2.5:3bfor lighter/faster).POLISH_ENABLED=truein.envand restart — or ask Claude Desktop "turn on polish, concise mode".
How it ships
Kit-owned helper src/transcription/ollama_polish.py (MIT) copied in by install.sh; the patcher only wires 4 thin main.py hooks. Verified byte-identical to a known-good install (24 edits + 3 shipped files) and idempotent. No upstream code redistributed.
macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.
v0.1.4 — capsule indicator style + MCP style switching
Pick your on-screen dictation indicator — and switch it right from Claude Desktop.
Added
- 🟣 Second indicator style:
capsule— a pill that appears with three pulsing dots while
you record, retracts into a small circle + spinner while transcribing, then turns
green and fades when your text lands. (The original glowingringis still the default.) - 🎛️
INDICATOR_STYLE(ring|capsule) picks the style; the app selects it via a tiny
factory. New kit-owned UI filesrc/ui/capsule_indicator.py(MIT), copied in byinstall.sh. - 🤖 Switch styles from Claude Desktop — the MCP server gains
set_indicator_style
(ring/capsule/off) that updates the config and restarts in one call.
INDICATOR_STYLEis now writable via MCP, andstatusreports the current style.
How it ships
- The patcher only wires
main.pyhooks (now 5 indicator hooks: flags, factory, instantiation,
state-change). Verified byte-identical to a known-good install (20 edits + 2 shipped UI
files) and idempotent. No upstream code redistributed.
macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.
v0.1.3 — on-screen listening indicator
A small, native on-screen indicator so you can see your dictation state at a glance.
Added
- 🟢 Listening indicator — a bottom-center, click-through overlay that floats over any app:
- a breathing ring while recording,
- a spinner while transcribing,
- a green expand-and-fade burst when your text is pasted.
Fully transparent (no panel chrome) with a soft glow, so it blends into whatever's behind it.
- Toggle with
SHOW_INDICATOR(default on); also settable from the MCP server.
How it ships
- The UI component (
src/ui/listening_indicator.py) is kit-owned (MIT) and copied into your
checkout byinstall.sh. The idempotent patcher only wires fourmain.pyhooks. No upstream
code is redistributed. - Verified: the patcher applies to the pinned upstream commit byte-identical to a known-good
install (19 edits + 1 shipped file) and is idempotent.
macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.
v0.1.2 — manage from Claude Desktop (MCP), env-configurable sounds, auto-cleanup
Frictionless upgrades to the local-whisper dictation kit.
Added
- 🤖 MCP server (
mcp/server.py) — monitor & configure the dictation service from Claude Desktop (or any MCP client) by just asking:status,logs,get_config/set_config,restart,recent_transcriptions, and model management. No UI to build. - 🛠️
install-mcp.sh— installs themcpSDK into the app venv and merges awhisper-inputentry into Claude Desktop's config (existing servers untouched, backed up, idempotent). Refuses to run while Claude Desktop is open (it rewrites its own config). - 🧹 Audio-archive auto-cleanup — deletes recordings older than
AUDIO_ARCHIVE_RETENTION_HOURS(default 24) at startup and periodically; prunescache.json.
Changed
- 🔊 Sound cues are now env-configurable —
SOUND_START/SOUND_STOP/SOUND_DONE/SOUND_ERROR/SOUND_WARNING(defaults unchanged: Submarine/Submarine/Glass/Basso/Funk). uninstall.shnow also removes the Claude Desktop MCP entry.
Quality
- Patcher verified: applies to the pinned upstream commit byte-identical to a known-good install, and is idempotent on re-run.
- Went through an adversarial multi-agent review (13 findings, 0 false positives); all confirmed items fixed — atomic config writes, write-once backup,
hf-mirror.commodel fallback behind the GFW, and more.
Not affiliated with or endorsed by upstream. macOS only. See CREDITS.md for the upstream license note.
v0.1.1 — Local-mode punctuation
✍️ What's new — better Chinese punctuation (local mode)
Local transcripts now come out punctuated:
- Prompt-guided punctuation — a default
WHISPER_PROMPTmakes whisper.cpp emit punctuation (run-on Chinese speech was coming out with none). - Full-width normalization — half-width punctuation next to Chinese is converted to full-width 「,。!?」 via
WHISPER_FULLWIDTH_PUNCT(on by default).
Both are configurable in .env — set WHISPER_PROMPT= (empty) or WHISPER_FULLWIDTH_PUNCT=false to opt out.
Upgrade an existing install
cd whisper-input-next-mac-kit && git pull && ./install.shThe patcher is idempotent and leaves your .env untouched (code defaults provide punctuation).
Full details in CHANGELOG.md.
v0.1.0 — Initial release
A one-command macOS installer that turns Whisper-Input-Next into an always-on, fully local voice keyboard — offline, free, and private.
✨ Highlights
- 🎙️ Tap Right-⌘ to start/stop — single-tap toggle, no chord, no app conflicts
- 🔊 Audio cues — Submarine on start/stop, Glass when text is pasted
- 🚀 launchd auto-start service — starts at login, restarts on crash, no terminal window
- 🧠 Ctrl+F also routes to local whisper.cpp
- ⚙️ Turn-key: uv venv, deps,
whisper-cpp, model,.env, LaunchAgent — all wired automatically
🚀 Install (macOS)
git clone https://github.com/AlexFlanker/whisper-input-next-mac-kit.git
cd whisper-input-next-mac-kit && ./install.sh📌 Notes
- macOS only. Default model
large-v3-turbo(override withWIN_MODEL=large-v3). - Does not redistribute upstream source — the installer clones it from the official repo on your machine. All credit to the upstream authors — see CREDITS.md.
- Targets upstream commit
5edec44.
MIT licensed (covers this kit's own code).
