This user-friendly Python application converts audio files (MP3, WAV, M4A, FLAC, etc.) into text using OpenAI’s Whisper model via a simple, multilingual Tkinter GUI.
- 🌐 Multilingual interface:
- Full support for English and French UI.
- Over 30 transcription languages available.
- 🔄 Dynamic language switching: Change interface language live from the dropdown (top left), or during first startup via popup.
- 📁 Multi-file support: Select and transcribe multiple audio files at once.
- 🧠 Smart model selection: Choose from Whisper models:
tiny,base,small,medium,large. - 📦 Auto-saving:
- Files are saved with timestamped filenames.
- Results are stored in the
output/folder.
- 🌀 Progress feedback:
- A spinner animates while transcribing.
- Status label changes from
"Transcribing..."to"Finished.".
- 📂 Output access:
- Click View transcriptions to open the output folder.
- Transcriptions are also shown live in the interface.
- 🔧 Persistent preferences:
- Last selected model and language are saved in a local
config.jsonfile.
- Last selected model and language are saved in a local
- ⚙️ Lightweight and easy to install.
- Python ≥ 3.8
- Whisper speech-to-text engine:
pip install openai-whisper- FFmpeg (required for audio decoding):
winget install ffmpegOr download manually from ffmpeg.org and add the bin folder to your system PATH.
sudo apt install ffmpeg # Debian/Ubuntu
brew install ffmpeg # macOS (with Homebrew)- Clone this repository:
git clone https://github.com/Pizzakira/Transcribe
cd transcribe- (Optional) Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows- Install dependencies:
pip install openai-whisper- Verify FFmpeg is installed:
ffmpeg -version- Windows: Double-click
transcribe_gui.pyor create a.batfile. - macOS/Linux:
python transcribe_gui.py- Choose file(s): Click the Choose File… button. Multiple file selection is supported.
- Model: Pick a Whisper model (
tiny,base,small, etc.). - Transcription language:
- Choose Automatic or select a specific language.
- Click Transcribe: Wait while processing.
- View results:
- Transcription text appears in the interface.
- Output is saved in the
output/folder.
- Change UI language:
- Use the interface language selector (top-left) to switch between French/English.
✅ Phase 1 — Core Features (Completed)
Error recovery
→ Resumes an interrupted transcription session using saved state.
Status: ✅ Implemented
Batch transcription
→ Allows selection of multiple audio files and processes them sequentially.
Status: ✅ Implemented
Persistent configuration
→ Stores user preferences (Whisper model, transcription language) in config.json.
Status: ✅ Implemented
✅ Phase 2 — Technical and Maintenance (Completed)
Memory release (GPU/CPU)
→ Releases resources after transcription to avoid overload.
Status: ✅ Implemented
Model update check
→ Checks for Whisper model updates and notifies the user.
Status: ❌ Dropped (feature removed at your request)
Automatic ffmpeg verification
→ Checks if ffmpeg is available at launch, displays error otherwise.
Status: ✅ Implemented
🟡 Phase 3 — Developer Experience (In Progress)
Docstrings for all functions
→ Improves readability and maintainability.
Status: 🔄 Planned
Type annotations (type hints)
→ Clarifies expected argument and return types.
Status: 🔄 Planned
Modular code structure
→ Separates logic into distinct files: UI, transcription engine, configuration.
Status: 🔄 Planned
Contributions are welcome! Feel free to open issues or submit pull requests.