A lightweight Windows system-tray app that transcribes your speech and types it directly into whate ver window you have open — no API key, no internet connection, no cost.
Built with faster-whisper (local Whisper model) and Py Qt6.
- Windows 10 or 11
- Python 3.10 or newer — download from python.org
1. Clone or download the repository
git clone https://github.com/bazzofx/voice2text.git
cd voice2text
Or download and extract the ZIP from GitHub.
2. Install dependencies
Double-click install.bat, or run in a terminal:
pip install -r requirements.txt
This installs PyQt6, faster-whisper, sounddevice, pynput, and pyperclip.
3. Run the app
python main.py
A small coloured dot will appear in your system tray (bottom-right of the taskbar).
et. The tray icon will be grey while loading and turn green when ready. This download only happens once.
- Open any app where you want to type — Notepad, Word, a browser text field, etc.
- Click inside it so it has focus.
- Press the hotkey (Ctrl+Space by default) to start recording. The tray icon turns red.
- Speak clearly.
- Press the hotkey again to stop. The icon turns blue while transcribing.
- The transcribed text is automatically pasted into your active window.
| Colour | Meaning |
|---|---|
| Grey | Loading model |
| Green | Ready |
| Red | Recording |
| Blue | Transcribing |
Right-click the tray icon and choose Settings to change:
| Setting | Options |
|---|---|
| Hotkey | Any key combination, e.g. Ctrl+Space, Win+Space |
| Mode | Toggle (press once to start / again to stop) or Push-to-talk (hold to record) |
| Model size | tiny (fast) → large-v3 (most accurate) |
| Language | Auto-detect or pick a specific language |
| Device | CPU (default) or CUDA GPU |
| Microphone | Syste |
| small | ~244 MB |
| medium | ~769 MB |
| large-v3 | ~1.5 GB |
By default the app runs on CPU, which works on any machine. If you have an NVIDIA GPU and want fast er transcription:
- Install the CUDA Toolkit 12 from NVIDIA.
- Open Settings in the tray app and change Device to
CUDA GPU.
Tray icon doesn't appear Make sure system tray icons are visible in Windows taskbar settings.
Text is not pasted into my window Make sure the target window has focus (click inside it) before the transcription finishes. The app pastes using Ctrl+V into the active window.
Transcription is slow
Try switching to the tiny model in Settings for faster (but less accurate) results.
Audio device not detected Open Settings and select your microphone manually from the Microphone dropdown.