🎙️ WhisperNote

Minimalist, private, and powerful desktop tool for local audio transcription.

⚡ What is WhisperNote?

WhisperNote is a simple tool designed for one purpose: to turn your voice or audio files into text using the power of OpenAI Whisper, running entirely on your PC.

No complex settings, no cloud subscriptions, no data leaks. Just press Record and receive your transcription.

Key Features

🔒 100% Private & Offline: All processing happens locally on your GPU/CPU. Your audio never leaves your computer.
🚀 Portable: No installation required. Just unzip and run.
🧠 Powerful AI: Supports all official OpenAI Whisper models (from tiny to large-v3, including turbo and .en variants).
🎙️ Voice Recorder: Built-in recorder with Pause/Resume and crash protection.
📂 Drag & Drop: Easily transcribe existing audio files.
💾 Smart Notes: Auto-saving, history, and export to .txt.
🎨 Modern UI: Clean interface with Dark/Light themes.

📥 Download & Install

Go to the Releases page.
Download the latest WhisperNote_v0.1.zip.
Extract the folder to a convenient location (e.g., Desktop).
Run WhisperNote.exe.

Note on File Size: The application weighs ~4 GB because it includes a full Python environment, PyTorch with CUDA support, and FFmpeg. This ensures it works out-of-the-box without installing anything else.

🚀 Getting Started

First Run: Upon first launch, the app will automatically download the smallest model (tiny) to verify functionality. This may take a few seconds.
Select Model: Choose a model from the dropdown menu.
- Status "Downloading...": The model is being downloaded from the internet (happens once).
- Status "Loading...": The model is being loaded into RAM/VRAM.
Transcribe:
- Click 🎙️ to record from your microphone.
- Click 📂 to select a file (or drag & drop it).

📊 Performance Benchmarks

Time to transcribe a 10-minute (630s) English audio file. Lower is better.

Model	RTX 4070 Laptop	RTX 2070 Super	Notes
tiny	00:15 (42.0x)	00:20 (31.5x)	Blazing fast, lowest accuracy
tiny.en	00:20 (31.5x)	00:19 (33.2x)	Fastest model available
base	00:19 (33.1x)	00:25 (25.2x)	Very fast, standard baseline
base.en	00:22 (28.6x)	00:26 (24.2x)	Optimized for English, very fast
small	00:33 (19.1x)	00:44 (14.3x)	Good speed/accuracy trade-off
small.en	00:35 (18.0x)	00:44 (14.3x)	Balanced choice for English
medium	03:02 (3.5x)	03:47 (2.8x)	Heavy load, significantly slower
medium.en	01:00 (10.5x)	01:17 (8.2x)	Great balance for English tasks
turbo	02:00 (5.25x)	03:33 (3.0x)	High accuracy, moderate speed
large-v3	08:51 (1.2x)	17:29 (0.6x)	Slower than real-time, max precision

(English-only models .en are slightly faster for English audio. Tests were performed on Windows 11, CUDA enabled, using the same 630s audio file.)

⚠️ Known Limitations & FAQ

Q: Why is the app size so large? A: It includes the PyTorch engine and CUDA libraries to run AI locally on your GPU. This allows it to work offline and be portable.

Q: The app says "Downloading..." for a long time. A: Large models (like Medium or Large) can weigh up to ~1.5-3GB. Please be patient. If the download is stuck, check your internet connection (In some regions, model downloads may require additional network configuration.).

Q: I see repeated phrases or "Subtitle by..." at the end of the text. A: This is a known behavior of the Whisper model called "hallucination," which often occurs during long periods of silence or background noise.

Why it happens: The model was trained on internet subtitles and sometimes tries to "predict" text even when there is no speech.
Our config: We use optimized settings (temperature=0.2, condition_on_previous_text=True) to minimize this behavior while maintaining high accuracy for actual speech.

Q: Can I stop the transcription? A: Currently, no. Once started, the process must finish. This will be addressed in future updates.

🗺️ Roadmap

AI Post-processing: Integration with local LLMs (Llama/Mistral) to fix punctuation, paragraphs, and remove hallucinations.
System Tray: Minimize to tray for quick recording.
UI Improvements: More settings and customization.

👨‍💻 Credits & Tech Stack

Core: OpenAI Whisper
GUI: PyQt6
Audio: PyAudio, PyDub, FFmpeg
Engine: PyTorch (CUDA)

License: MIT License. Free to use and modify.

Created by LokiSkardina

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
core		core
utils		utils
widgets		widgets
.gitignore		.gitignore
README.md		README.md
app_window.py		app_window.py
build.py		build.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ WhisperNote

⚡ What is WhisperNote?

Key Features

📥 Download & Install

🚀 Getting Started

📊 Performance Benchmarks

⚠️ Known Limitations & FAQ

🗺️ Roadmap

👨‍💻 Credits & Tech Stack

About

Uh oh!

Releases

Packages

Languages

LokiSkardina/WhisperNote

Folders and files

Latest commit

History

Repository files navigation

🎙️ WhisperNote

⚡ What is WhisperNote?

Key Features

📥 Download & Install

🚀 Getting Started

📊 Performance Benchmarks

⚠️ Known Limitations & FAQ

🗺️ Roadmap

👨‍💻 Credits & Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages