Minimalist, private, and powerful desktop tool for local audio transcription.
WhisperNote is a simple tool designed for one purpose: to turn your voice or audio files into text using the power of OpenAI Whisper, running entirely on your PC.
No complex settings, no cloud subscriptions, no data leaks. Just press Record and receive your transcription.
- 🔒 100% Private & Offline: All processing happens locally on your GPU/CPU. Your audio never leaves your computer.
- 🚀 Portable: No installation required. Just unzip and run.
- 🧠 Powerful AI: Supports all official OpenAI Whisper models (from
tinytolarge-v3, includingturboand.envariants). - 🎙️ Voice Recorder: Built-in recorder with Pause/Resume and crash protection.
- 📂 Drag & Drop: Easily transcribe existing audio files.
- 💾 Smart Notes: Auto-saving, history, and export to
.txt. - 🎨 Modern UI: Clean interface with Dark/Light themes.
- Go to the Releases page.
- Download the latest
WhisperNote_v0.1.zip. - Extract the folder to a convenient location (e.g., Desktop).
- Run
WhisperNote.exe.
Note on File Size: The application weighs ~4 GB because it includes a full Python environment, PyTorch with CUDA support, and FFmpeg. This ensures it works out-of-the-box without installing anything else.
- First Run: Upon first launch, the app will automatically download the smallest model (
tiny) to verify functionality. This may take a few seconds. - Select Model: Choose a model from the dropdown menu.
- Status "Downloading...": The model is being downloaded from the internet (happens once).
- Status "Loading...": The model is being loaded into RAM/VRAM.
- Transcribe:
- Click 🎙️ to record from your microphone.
- Click 📂 to select a file (or drag & drop it).
Time to transcribe a 10-minute (630s) English audio file. Lower is better.
| Model | RTX 4070 Laptop | RTX 2070 Super | Notes |
|---|---|---|---|
| tiny | 00:15 (42.0x) | 00:20 (31.5x) | Blazing fast, lowest accuracy |
| tiny.en | 00:20 (31.5x) | 00:19 (33.2x) | Fastest model available |
| base | 00:19 (33.1x) | 00:25 (25.2x) | Very fast, standard baseline |
| base.en | 00:22 (28.6x) | 00:26 (24.2x) | Optimized for English, very fast |
| small | 00:33 (19.1x) | 00:44 (14.3x) | Good speed/accuracy trade-off |
| small.en | 00:35 (18.0x) | 00:44 (14.3x) | Balanced choice for English |
| medium | 03:02 (3.5x) | 03:47 (2.8x) | Heavy load, significantly slower |
| medium.en | 01:00 (10.5x) | 01:17 (8.2x) | Great balance for English tasks |
| turbo | 02:00 (5.25x) | 03:33 (3.0x) | High accuracy, moderate speed |
| large-v3 | 08:51 (1.2x) | 17:29 (0.6x) | Slower than real-time, max precision |
(English-only models .en are slightly faster for English audio. Tests were performed on Windows 11, CUDA enabled, using the same 630s audio file.)
Q: Why is the app size so large? A: It includes the PyTorch engine and CUDA libraries to run AI locally on your GPU. This allows it to work offline and be portable.
Q: The app says "Downloading..." for a long time.
A: Large models (like Medium or Large) can weigh up to ~1.5-3GB. Please be patient. If the download is stuck, check your internet connection (In some regions, model downloads may require additional network configuration.).
Q: I see repeated phrases or "Subtitle by..." at the end of the text. A: This is a known behavior of the Whisper model called "hallucination," which often occurs during long periods of silence or background noise.
- Why it happens: The model was trained on internet subtitles and sometimes tries to "predict" text even when there is no speech.
- Our config: We use optimized settings (
temperature=0.2,condition_on_previous_text=True) to minimize this behavior while maintaining high accuracy for actual speech.
Q: Can I stop the transcription? A: Currently, no. Once started, the process must finish. This will be addressed in future updates.
- AI Post-processing: Integration with local LLMs (Llama/Mistral) to fix punctuation, paragraphs, and remove hallucinations.
- System Tray: Minimize to tray for quick recording.
- UI Improvements: More settings and customization.
- Core: OpenAI Whisper
- GUI: PyQt6
- Audio: PyAudio, PyDub, FFmpeg
- Engine: PyTorch (CUDA)
License: MIT License. Free to use and modify.
Created by LokiSkardina