A local voice-to-text tool for Linux. Press a hotkey, speak, and it types what you said at your cursor β in any app (browser, editor, terminal, etc.).
Works on both X11 and Wayland (including GNOME).
- Local transcription using Whisper (no cloud, no API keys)
- Types directly at cursor in any focused application
- Works on Wayland (GNOME, KDE, Sway, Hyprland) and X11
- Audio feedback for start/stop
- Configurable hotkey and model size
- Clipboard fallback if direct typing fails
git clone <repo-url> ~/dev/linux-whisper
cd ~/dev/linux-whisper
./install.sh # Installs deps, sets up permissions
# Log out and back in (required for input group)
./run.sh # Start dictatingThe install script handles these automatically, but for reference:
Ubuntu/Debian:
sudo apt install python3-pip python3-venv portaudio19-dev ffmpeg ydotool wl-clipboardBoth hotkey detection (evdev) and text injection (ydotool) require kernel-level access:
- input group β your user must be in the
inputgroup - uinput device β
/dev/uinputmust be accessible to theinputgroup - ydotoold daemon β must be running as a user service
# The install script does all of this, but manually:
sudo usermod -aG input $USER
echo 'KERNEL=="uinput", MODE="0660", GROUP="input"' | sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
systemctl --user enable --now ydotoold.service
# Log out and back in./run.shFirst run downloads the Whisper model (~150MB for base.en).
- Ctrl+Space (default): Start recording
- Recording stops automatically when you stop speaking (VAD)
- Transcribed text is typed at your cursor in the focused app
Edit config.json:
{
"hotkey": "<ctrl>+space",
"model": "base.en",
"language": "en",
"input_method": "auto",
"sound_feedback": true,
"continuous_mode": false
}| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny.en |
~75MB | Fastest | Good |
base.en |
~150MB | Fast | Better |
small.en |
~500MB | Medium | Great |
medium.en |
~1.5GB | Slow | Excellent |
large-v3 |
~3GB | Slowest | Best |
| Method | X11 | Wayland (GNOME) | Wayland (wlroots) |
|---|---|---|---|
ydotool |
Yes | Yes | Yes |
xdotool |
Yes | No | No |
wtype |
No | No | Yes |
clipboard |
Yes | Yes | Yes |
auto picks the best method for your session. Falls back to clipboard paste if typing fails.
Examples: <ctrl>+space, <ctrl>+<shift>+d, <super>+v, <alt>+<shift>+r
- Hotkey detection via
python-evdevβ reads keyboard events at the kernel level (works on X11 and Wayland) - Audio capture via RealtimeSTT β records until voice activity stops
- Transcription via faster-whisper β local Whisper model, no cloud
- Text injection via
ydotoolβ types at cursor via uinput (kernel-level, below compositor) - Clipboard fallback β if ydotool typing fails, copies to clipboard and pastes with Ctrl+V
You need to be in the input group:
sudo usermod -aG input $USER
# Log out and back inThe udev rule for /dev/uinput is missing or permissions aren't applied:
# Check current permissions
ls -la /dev/uinput
# Should be: crw-rw---- root input
# If not, re-run install.sh or:
echo 'KERNEL=="uinput", MODE="0660", GROUP="input"' | sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules && sudo udevadm triggersystemctl --user status ydotoold
systemctl --user start ydotooldarecord -l # List audio devicesSee RESEARCH.md for detailed analysis of alternatives and design decisions.
MIT