Skip to content

Releases: AlexFlanker/whisper-input-next-mac-kit

v0.2.1 — freeze resilience (hold-to-restart + diagnostics)

02 Jun 08:02

Choose a tag to compare

Occasional hangs? Now there's an escape hatch and a black box.

Added

  • 🆘 Hold-to-restart — if the service ever freezes, press & hold the on-screen indicator ~1.8s: a red ring fills, then it force-restarts (launchd brings it back in ~1s). It also snapshots all thread stacks right before exiting, so the freeze is captured even if you restart immediately.
  • 🔬 Freeze watchdog — when something stays stuck in a busy state too long (transcription 45s / recording 180s, configurable), it dumps every thread's stack to logs/freeze_diagnostics.log for diagnosis. Diagnostics-only — never auto-restarts.
  • ⏱️ Real whisper-cli timeout (WHISPER_SUBPROCESS_TIMEOUT, default 90s) — kills a hung child instead of leaking it for 3 minutes behind the old thread-based wrapper.

Fixed

  • The on-screen indicator now receives clicks. The app's status_bar.py ran AppHelper.runConsoleEventLoop(), which pumps the run loop (animations/callAfter/display all work) but never dispatches NSApp mouse events to windows — so the hold-to-restart overlay got nothing. Switched to AppHelper.runEventLoop() (activation policy unchanged). Diagnosed empirically: under the console loop, clicks fired neither acceptsFirstMouse nor mouseDown under both Prohibited and Accessory; runEventLoop fixed it.

How it ships

Two kit-owned files (src/utils/freeze_watchdog.py, src/ui/hold_restart.py, MIT) copied in by install.sh; the patcher wires 6 thin hooks into main.py / status_bar.py / local_whisper.py. Verified byte-identical to a known-good install (30 edits + 5 shipped files) and idempotent.

macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.

v0.2.0 — local LLM polish (your dictation, cleaned up on-device)

02 Jun 06:52

Choose a tag to compare

The first release on the core-quality axis: v0.1.x made dictation pleasant to use and manage; v0.2.0 makes it read better — fully on-device.

Added

  • 🧠 Local LLM polish — after transcription, an optional local Ollama model tidies the text before it's pasted:
    • light — faithful: fixes punctuation, typos, obvious stumbles, keeps your wording.
    • concise — trims filler / repetition / 口头禅 for a much shorter result.
    • Default model glm4 (GLM-4-9B); 100% offline; opt-in (POLISH_ENABLED, off by default); a hard timeout falls back to the raw transcript, so a slow or absent Ollama never blocks dictation.
  • 🤖 MCP set_polish (mode / model / enabled) — flip light↔concise, swap models, or toggle it, right from Claude Desktop. POLISH_* added to the writable allowlist; polish state shows in status().

Enable it

  1. brew install ollama && ollama pull glm4 (GLM-4-9B, ~5.5 GB; or qwen2.5:3b for lighter/faster).
  2. POLISH_ENABLED=true in .env and restart — or ask Claude Desktop "turn on polish, concise mode".

How it ships

Kit-owned helper src/transcription/ollama_polish.py (MIT) copied in by install.sh; the patcher only wires 4 thin main.py hooks. Verified byte-identical to a known-good install (24 edits + 3 shipped files) and idempotent. No upstream code redistributed.

macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.

v0.1.4 — capsule indicator style + MCP style switching

01 Jun 01:05

Choose a tag to compare

Pick your on-screen dictation indicator — and switch it right from Claude Desktop.

Added

  • 🟣 Second indicator style: capsule — a pill that appears with three pulsing dots while
    you record, retracts into a small circle + spinner while transcribing, then turns
    green and fades when your text lands. (The original glowing ring is still the default.)
  • 🎛️ INDICATOR_STYLE (ring | capsule) picks the style; the app selects it via a tiny
    factory. New kit-owned UI file src/ui/capsule_indicator.py (MIT), copied in by install.sh.
  • 🤖 Switch styles from Claude Desktop — the MCP server gains set_indicator_style
    (ring / capsule / off) that updates the config and restarts in one call.
    INDICATOR_STYLE is now writable via MCP, and status reports the current style.

How it ships

  • The patcher only wires main.py hooks (now 5 indicator hooks: flags, factory, instantiation,
    state-change). Verified byte-identical to a known-good install (20 edits + 2 shipped UI
    files) and idempotent. No upstream code redistributed.

macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.

v0.1.3 — on-screen listening indicator

01 Jun 00:43

Choose a tag to compare

A small, native on-screen indicator so you can see your dictation state at a glance.

Added

  • 🟢 Listening indicator — a bottom-center, click-through overlay that floats over any app:
    • a breathing ring while recording,
    • a spinner while transcribing,
    • a green expand-and-fade burst when your text is pasted.
      Fully transparent (no panel chrome) with a soft glow, so it blends into whatever's behind it.
  • Toggle with SHOW_INDICATOR (default on); also settable from the MCP server.

How it ships

  • The UI component (src/ui/listening_indicator.py) is kit-owned (MIT) and copied into your
    checkout by install.sh. The idempotent patcher only wires four main.py hooks. No upstream
    code is redistributed.
  • Verified: the patcher applies to the pinned upstream commit byte-identical to a known-good
    install (19 edits + 1 shipped file) and is idempotent.

macOS only. Not affiliated with or endorsed by upstream. See CREDITS.md.

v0.1.2 — manage from Claude Desktop (MCP), env-configurable sounds, auto-cleanup

01 Jun 00:08

Choose a tag to compare

Frictionless upgrades to the local-whisper dictation kit.

Added

  • 🤖 MCP server (mcp/server.py) — monitor & configure the dictation service from Claude Desktop (or any MCP client) by just asking: status, logs, get_config/set_config, restart, recent_transcriptions, and model management. No UI to build.
  • 🛠️ install-mcp.sh — installs the mcp SDK into the app venv and merges a whisper-input entry into Claude Desktop's config (existing servers untouched, backed up, idempotent). Refuses to run while Claude Desktop is open (it rewrites its own config).
  • 🧹 Audio-archive auto-cleanup — deletes recordings older than AUDIO_ARCHIVE_RETENTION_HOURS (default 24) at startup and periodically; prunes cache.json.

Changed

  • 🔊 Sound cues are now env-configurableSOUND_START / SOUND_STOP / SOUND_DONE / SOUND_ERROR / SOUND_WARNING (defaults unchanged: Submarine/Submarine/Glass/Basso/Funk).
  • uninstall.sh now also removes the Claude Desktop MCP entry.

Quality

  • Patcher verified: applies to the pinned upstream commit byte-identical to a known-good install, and is idempotent on re-run.
  • Went through an adversarial multi-agent review (13 findings, 0 false positives); all confirmed items fixed — atomic config writes, write-once backup, hf-mirror.com model fallback behind the GFW, and more.

Not affiliated with or endorsed by upstream. macOS only. See CREDITS.md for the upstream license note.

v0.1.1 — Local-mode punctuation

31 May 22:49

Choose a tag to compare

✍️ What's new — better Chinese punctuation (local mode)

Local transcripts now come out punctuated:

  • Prompt-guided punctuation — a default WHISPER_PROMPT makes whisper.cpp emit punctuation (run-on Chinese speech was coming out with none).
  • Full-width normalization — half-width punctuation next to Chinese is converted to full-width 「,。!?」 via WHISPER_FULLWIDTH_PUNCT (on by default).

Both are configurable in .env — set WHISPER_PROMPT= (empty) or WHISPER_FULLWIDTH_PUNCT=false to opt out.

Upgrade an existing install

cd whisper-input-next-mac-kit && git pull && ./install.sh

The patcher is idempotent and leaves your .env untouched (code defaults provide punctuation).

Full details in CHANGELOG.md.

v0.1.0 — Initial release

31 May 22:25

Choose a tag to compare

A one-command macOS installer that turns Whisper-Input-Next into an always-on, fully local voice keyboard — offline, free, and private.

demo

✨ Highlights

  • 🎙️ Tap Right-⌘ to start/stop — single-tap toggle, no chord, no app conflicts
  • 🔊 Audio cues — Submarine on start/stop, Glass when text is pasted
  • 🚀 launchd auto-start service — starts at login, restarts on crash, no terminal window
  • 🧠 Ctrl+F also routes to local whisper.cpp
  • ⚙️ Turn-key: uv venv, deps, whisper-cpp, model, .env, LaunchAgent — all wired automatically

🚀 Install (macOS)

git clone https://github.com/AlexFlanker/whisper-input-next-mac-kit.git
cd whisper-input-next-mac-kit && ./install.sh

📌 Notes

  • macOS only. Default model large-v3-turbo (override with WIN_MODEL=large-v3).
  • Does not redistribute upstream source — the installer clones it from the official repo on your machine. All credit to the upstream authors — see CREDITS.md.
  • Targets upstream commit 5edec44.

MIT licensed (covers this kit's own code).