Local voice transcription for Windows
Press a key, speak, release. Your words appear wherever you're typing.
Caution
Murmur is in early development. The core transcription pipeline works, but the user experience is still evolving. Expect rough edges and breaking changes between versions.
Murmur is a desktop voice transcription app that runs entirely on your machine. Hold down a hotkey, speak naturally, and your transcribed text is automatically typed into whatever application has focus. No cloud services, no subscriptions, no data leaving your computer.
The app consists of two components: an Electron desktop client with a minimal always-on-top overlay, and a local Python server powered by faster-whisper (OpenAI's Whisper model optimized for speed).
Note
Features marked with π§ are partially implemented or under development.
| Feature | Status | Description |
|---|---|---|
| Hold-to-talk | β | Press and hold hotkey to record, release to transcribe |
| Real-time feedback | β | See partial transcription as you speak |
| Minimal overlay | β | Non-intrusive pill with animated waveform |
| Auto-paste | β | Transcribed text typed into active window |
| Transcription history | β | Searchable, filterable local history |
| Runs locally | β | All processing on your machine (GPU/CPU) |
| Configurable hotkey | π§ | Currently hardcoded to F17 |
| Toggle mode | π§ | Click to start/stop (vs hold-to-talk) |
| Text post-processing | π§ | Filler word removal, punctuation |
flowchart TB
subgraph client["Desktop Client (Electron)"]
overlay["Overlay Window<br/><small>recording UI, waveform</small>"]
main["Main Window<br/><small>history, settings</small>"]
tray["System Tray"]
overlay & main & tray --> mainproc
mainproc["Main Process<br/><small>hotkey, audio capture, clipboard</small>"]
end
mainproc <-->|"WebSocket :51717<br/><small>binary audio + JSON</small>"| ws
subgraph server["Transcription Server (Python)"]
ws["FastAPI<br/>WebSocket"]
ws --> buffer["Audio Buffer<br/><small>16kHz PCM</small>"]
buffer --> whisper["faster-whisper<br/><small>Whisper AI</small>"]
whisper --> ws
end
style client fill:#1a1a2e,stroke:#4a4a6a,color:#fff
style server fill:#1a1a2e,stroke:#4a4a6a,color:#fff
Audio Pipeline Details
flowchart LR
mic["Microphone"] --> media["MediaStream API"]
media --> worklet["AudioWorklet<br/><small>PCM conversion</small>"]
worklet --> ipc["IPC Channel"]
ipc --> ws["WebSocket Client"]
ws --> server["Server"]
server --> buffer["Circular Buffer"]
buffer --> whisper["Whisper"]
whisper --> partial["Partial Text"]
whisper --> final["Final Text"]
partial & final --> display["Overlay Display"]
final --> clipboard["Clipboard"]
clipboard --> paste["Auto-paste"]
| Component | Technologies |
|---|---|
| Desktop App | Electron, Svelte 5, TypeScript, Tailwind CSS v4 |
| Server | Python 3.11+, FastAPI, faster-whisper, uvicorn |
| Database | SQLite (better-sqlite3) for history |
| Audio | Web Audio API, AudioWorklet, 16-bit PCM @ 16kHz |
- Windows 10/11 (the Electron app currently targets Windows only)
- Node.js 18+ and Bun (for the desktop app)
- Python 3.11+ and uv (for the server)
- CUDA-capable GPU (recommended) or CPU for transcription
1. Clone the repository
git clone https://github.com/yourusername/murmur.git
cd murmur2. Set up the transcription server
cd server
# Install dependencies with uv
uv sync
# Start the server
just start
# Or in background: just start-bg[!TIP] The server will download the Whisper model on first run (~1.5GB for the default model). This only happens once.
3. Set up the desktop app
From PowerShell on Windows:
cd app
# Install dependencies
bun install
# Run in development mode
bun run dev- Start the transcription server (
just startin the server directory) - Launch the Murmur app
- Press and hold F17 to record
- Speak naturally
- Release the key β your text appears in the active window
Tip
The app lives in your system tray. Click the tray icon to access settings and history.
Murmur uses a custom WebSocket protocol for efficient audio streaming and transcription.
sequenceDiagram
participant C as Client
participant S as Server
C->>S: WebSocket Connect
C->>S: control:start
S->>C: control:ready
loop Recording
C->>S: audio frame (binary)
S-->>C: text:partial
end
C->>S: control:stop
S->>C: text:final
S->>C: control:closing
S->>C: WebSocket Close
Protocol Features
- Binary audio frames β 5-byte header (sequence, sample count, flags) + PCM data
- JSON control frames β Session management (start, stop, ready, error)
- Text frames β Partial and final transcription results with confidence scores
- Silence detection β Automatic session ending after configurable timeout
See the full Protocol Specification for details.
All server settings can be configured via environment variables prefixed with MURMUR_.
| Variable | Default | Description |
|---|---|---|
MURMUR_HOST |
0.0.0.0 |
Server bind address |
MURMUR_PORT |
51717 |
Server port |
MURMUR_MAX_SESSIONS |
10 |
Maximum concurrent sessions |
MURMUR_START_TIMEOUT |
10.0 |
Seconds to wait for start frame |
MURMUR_WHISPER_MODEL |
large-v3-turbo |
Whisper model to use |
MURMUR_WHISPER_DEVICE |
auto |
Device: auto, cpu, or cuda |
MURMUR_WHISPER_COMPUTE_TYPE |
auto |
Compute type: auto, int8, float16, etc. |
MURMUR_PARTIAL_EMISSION_INTERVAL |
0.2 |
Minimum seconds between partial transcription updates |
MURMUR_MIN_AUDIO_FOR_TRANSCRIPTION |
0.5 |
Minimum audio (seconds) before transcribing |
MURMUR_LOG_LEVEL |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR |
MURMUR_LOG_BINARY |
false |
Enable verbose binary frame logging (very spammy) |
Example: Running with debug logging
# PowerShell
$env:MURMUR_LOG_LEVEL="DEBUG"; uv run murmur
# Also enable binary frame logging (very verbose)
$env:MURMUR_LOG_LEVEL="DEBUG"; $env:MURMUR_LOG_BINARY="true"; uv run murmurApp settings are configured through the Settings UI (accessible from the system tray). Settings include:
- Hotkey β Keyboard shortcut to trigger recording
- Activation Mode β Hold-to-talk or toggle
- Input Device β Microphone selection
- Auto-copy/Auto-paste β Clipboard behavior
- Update Speed β How often partial transcriptions update (100-500ms)
- Server URL β WebSocket endpoint for the transcription server
See BUILDING.md for the full development setup, production packaging, and troubleshooting guide.
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
Built with Electron, Svelte, FastAPI, and FasterWhisper