Murmur

Local voice transcription for Windows

Press a key, speak, release. Your words appear wherever you're typing.

Caution

Murmur is in early development. The core transcription pipeline works, but the user experience is still evolving. Expect rough edges and breaking changes between versions.

What is Murmur?

Murmur is a desktop voice transcription app that runs entirely on your machine. Hold down a hotkey, speak naturally, and your transcribed text is automatically typed into whatever application has focus. No cloud services, no subscriptions, no data leaving your computer.

The app consists of two components: an Electron desktop client with a minimal always-on-top overlay, and a local Python server powered by faster-whisper (OpenAI's Whisper model optimized for speed).

Features

Note

Features marked with 🚧 are partially implemented or under development.

Feature	Status	Description
Hold-to-talk	✅	Press and hold hotkey to record, release to transcribe
Real-time feedback	✅	See partial transcription as you speak
Minimal overlay	✅	Non-intrusive pill with animated waveform
Auto-paste	✅	Transcribed text typed into active window
Transcription history	✅	Searchable, filterable local history
Runs locally	✅	All processing on your machine (GPU/CPU)
Configurable hotkey	🚧	Currently hardcoded to F17
Toggle mode	🚧	Click to start/stop (vs hold-to-talk)
Text post-processing	🚧	Filler word removal, punctuation

Architecture

flowchart TB
    subgraph client["Desktop Client (Electron)"]
        overlay["Overlay Window<br/><small>recording UI, waveform</small>"]
        main["Main Window<br/><small>history, settings</small>"]
        tray["System Tray"]

        overlay & main & tray --> mainproc
        mainproc["Main Process<br/><small>hotkey, audio capture, clipboard</small>"]
    end

    mainproc <-->|"WebSocket :51717<br/><small>binary audio + JSON</small>"| ws

    subgraph server["Transcription Server (Python)"]
        ws["FastAPI<br/>WebSocket"]
        ws --> buffer["Audio Buffer<br/><small>16kHz PCM</small>"]
        buffer --> whisper["faster-whisper<br/><small>Whisper AI</small>"]
        whisper --> ws
    end

    style client fill:#1a1a2e,stroke:#4a4a6a,color:#fff
    style server fill:#1a1a2e,stroke:#4a4a6a,color:#fff

Audio Pipeline Details

flowchart LR
    mic["Microphone"] --> media["MediaStream API"]
    media --> worklet["AudioWorklet<br/><small>PCM conversion</small>"]
    worklet --> ipc["IPC Channel"]
    ipc --> ws["WebSocket Client"]
    ws --> server["Server"]
    server --> buffer["Circular Buffer"]
    buffer --> whisper["Whisper"]
    whisper --> partial["Partial Text"]
    whisper --> final["Final Text"]
    partial & final --> display["Overlay Display"]
    final --> clipboard["Clipboard"]
    clipboard --> paste["Auto-paste"]

Tech Stack

Component	Technologies
Desktop App	Electron, Svelte 5, TypeScript, Tailwind CSS v4
Server	Python 3.11+, FastAPI, faster-whisper, uvicorn
Database	SQLite (better-sqlite3) for history
Audio	Web Audio API, AudioWorklet, 16-bit PCM @ 16kHz

Getting Started

Prerequisites

Windows 10/11 (the Electron app currently targets Windows only)
Node.js 18+ and Bun (for the desktop app)
Python 3.11+ and uv (for the server)
CUDA-capable GPU (recommended) or CPU for transcription

Installation

1. Clone the repository

git clone https://github.com/yourusername/murmur.git
cd murmur

2. Set up the transcription server

cd server

# Install dependencies with uv
uv sync

# Start the server
just start
# Or in background: just start-bg

[!TIP] The server will download the Whisper model on first run (~1.5GB for the default model). This only happens once.

3. Set up the desktop app

From PowerShell on Windows:

cd app

# Install dependencies
bun install

# Run in development mode
bun run dev

Usage

Start the transcription server (just start in the server directory)
Launch the Murmur app
Press and hold F17 to record
Speak naturally
Release the key — your text appears in the active window

Tip

The app lives in your system tray. Click the tray icon to access settings and history.

Protocol

Murmur uses a custom WebSocket protocol for efficient audio streaming and transcription.

sequenceDiagram
    participant C as Client
    participant S as Server

    C->>S: WebSocket Connect
    C->>S: control:start
    S->>C: control:ready

    loop Recording
        C->>S: audio frame (binary)
        S-->>C: text:partial
    end

    C->>S: control:stop
    S->>C: text:final
    S->>C: control:closing
    S->>C: WebSocket Close

Protocol Features

Binary audio frames — 5-byte header (sequence, sample count, flags) + PCM data
JSON control frames — Session management (start, stop, ready, error)
Text frames — Partial and final transcription results with confidence scores
Silence detection — Automatic session ending after configurable timeout

See the full Protocol Specification for details.

Configuration

Server Environment Variables

All server settings can be configured via environment variables prefixed with MURMUR_.

Variable	Default	Description
`MURMUR_HOST`	`0.0.0.0`	Server bind address
`MURMUR_PORT`	`51717`	Server port
`MURMUR_MAX_SESSIONS`	`10`	Maximum concurrent sessions
`MURMUR_START_TIMEOUT`	`10.0`	Seconds to wait for start frame
`MURMUR_WHISPER_MODEL`	`large-v3-turbo`	Whisper model to use
`MURMUR_WHISPER_DEVICE`	`auto`	Device: `auto`, `cpu`, or `cuda`
`MURMUR_WHISPER_COMPUTE_TYPE`	`auto`	Compute type: `auto`, `int8`, `float16`, etc.
`MURMUR_PARTIAL_EMISSION_INTERVAL`	`0.2`	Minimum seconds between partial transcription updates
`MURMUR_MIN_AUDIO_FOR_TRANSCRIPTION`	`0.5`	Minimum audio (seconds) before transcribing
`MURMUR_LOG_LEVEL`	`INFO`	Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`
`MURMUR_LOG_BINARY`	`false`	Enable verbose binary frame logging (very spammy)

Example: Running with debug logging

# PowerShell
$env:MURMUR_LOG_LEVEL="DEBUG"; uv run murmur

# Also enable binary frame logging (very verbose)
$env:MURMUR_LOG_LEVEL="DEBUG"; $env:MURMUR_LOG_BINARY="true"; uv run murmur

App Settings

App settings are configured through the Settings UI (accessible from the system tray). Settings include:

Hotkey — Keyboard shortcut to trigger recording
Activation Mode — Hold-to-talk or toggle
Input Device — Microphone selection
Auto-copy/Auto-paste — Clipboard behavior
Update Speed — How often partial transcriptions update (100-500ms)
Server URL — WebSocket endpoint for the transcription server

Building

See BUILDING.md for the full development setup, production packaging, and troubleshooting guide.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

License

MIT

_{Built with Electron, Svelte, FastAPI, and FasterWhisper}

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.dev		.dev
.github/workflows		.github/workflows
app		app
docs		docs
homepage		homepage
scripts		scripts
server		server
.gitignore		.gitignore
AGENT.md		AGENT.md
BUILDING.md		BUILDING.md
CLAUDE.md		CLAUDE.md
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Murmur

What is Murmur?

Features

Architecture

Tech Stack

Getting Started

Prerequisites

Installation

Usage

Protocol

Configuration

Server Environment Variables

App Settings

Building

Contributing

License

About

Uh oh!

Releases 2

Uh oh!

Contributors 2

Uh oh!

Languages

dikkadev/murmur

Folders and files

Latest commit

History

Repository files navigation

Murmur

What is Murmur?

Features

Architecture

Tech Stack

Getting Started

Prerequisites

Installation

Usage

Protocol

Configuration

Server Environment Variables

App Settings

Building

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors 2

Uh oh!

Languages