Skip to content

LokiSkardina/WhisperNote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ WhisperNote

Minimalist, private, and powerful desktop tool for local audio transcription.

Python Whisper Platform License


⚡ What is WhisperNote?

Image

WhisperNote is a simple tool designed for one purpose: to turn your voice or audio files into text using the power of OpenAI Whisper, running entirely on your PC.

No complex settings, no cloud subscriptions, no data leaks. Just press Record and receive your transcription.

Key Features

  • 🔒 100% Private & Offline: All processing happens locally on your GPU/CPU. Your audio never leaves your computer.
  • 🚀 Portable: No installation required. Just unzip and run.
  • 🧠 Powerful AI: Supports all official OpenAI Whisper models (from tiny to large-v3, including turbo and .en variants).
  • 🎙️ Voice Recorder: Built-in recorder with Pause/Resume and crash protection.
  • 📂 Drag & Drop: Easily transcribe existing audio files.
  • 💾 Smart Notes: Auto-saving, history, and export to .txt.
  • 🎨 Modern UI: Clean interface with Dark/Light themes.

📥 Download & Install

  1. Go to the Releases page.
  2. Download the latest WhisperNote_v0.1.zip.
  3. Extract the folder to a convenient location (e.g., Desktop).
  4. Run WhisperNote.exe.

Note on File Size: The application weighs ~4 GB because it includes a full Python environment, PyTorch with CUDA support, and FFmpeg. This ensures it works out-of-the-box without installing anything else.


🚀 Getting Started

  1. First Run: Upon first launch, the app will automatically download the smallest model (tiny) to verify functionality. This may take a few seconds.
  2. Select Model: Choose a model from the dropdown menu.
    • Status "Downloading...": The model is being downloaded from the internet (happens once).
    • Status "Loading...": The model is being loaded into RAM/VRAM.
  3. Transcribe:
    • Click 🎙️ to record from your microphone.
    • Click 📂 to select a file (or drag & drop it).

📊 Performance Benchmarks

Time to transcribe a 10-minute (630s) English audio file. Lower is better.

Model RTX 4070 Laptop RTX 2070 Super Notes
tiny 00:15 (42.0x) 00:20 (31.5x) Blazing fast, lowest accuracy
tiny.en 00:20 (31.5x) 00:19 (33.2x) Fastest model available
base 00:19 (33.1x) 00:25 (25.2x) Very fast, standard baseline
base.en 00:22 (28.6x) 00:26 (24.2x) Optimized for English, very fast
small 00:33 (19.1x) 00:44 (14.3x) Good speed/accuracy trade-off
small.en 00:35 (18.0x) 00:44 (14.3x) Balanced choice for English
medium 03:02 (3.5x) 03:47 (2.8x) Heavy load, significantly slower
medium.en 01:00 (10.5x) 01:17 (8.2x) Great balance for English tasks
turbo 02:00 (5.25x) 03:33 (3.0x) High accuracy, moderate speed
large-v3 08:51 (1.2x) 17:29 (0.6x) Slower than real-time, max precision

(English-only models .en are slightly faster for English audio. Tests were performed on Windows 11, CUDA enabled, using the same 630s audio file.)


⚠️ Known Limitations & FAQ

Q: Why is the app size so large? A: It includes the PyTorch engine and CUDA libraries to run AI locally on your GPU. This allows it to work offline and be portable.

Q: The app says "Downloading..." for a long time. A: Large models (like Medium or Large) can weigh up to ~1.5-3GB. Please be patient. If the download is stuck, check your internet connection (In some regions, model downloads may require additional network configuration.).

Q: I see repeated phrases or "Subtitle by..." at the end of the text. A: This is a known behavior of the Whisper model called "hallucination," which often occurs during long periods of silence or background noise.

  • Why it happens: The model was trained on internet subtitles and sometimes tries to "predict" text even when there is no speech.
  • Our config: We use optimized settings (temperature=0.2, condition_on_previous_text=True) to minimize this behavior while maintaining high accuracy for actual speech.

Q: Can I stop the transcription? A: Currently, no. Once started, the process must finish. This will be addressed in future updates.


🗺️ Roadmap

  • AI Post-processing: Integration with local LLMs (Llama/Mistral) to fix punctuation, paragraphs, and remove hallucinations.
  • System Tray: Minimize to tray for quick recording.
  • UI Improvements: More settings and customization.

👨‍💻 Credits & Tech Stack

  • Core: OpenAI Whisper
  • GUI: PyQt6
  • Audio: PyAudio, PyDub, FFmpeg
  • Engine: PyTorch (CUDA)

License: MIT License. Free to use and modify.

Created by LokiSkardina

About

Local offline desktop app for audio transcription using OpenAI Whisper

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages