Transcribe audio and video files locally using OpenAI Whisper. No internet needed after model download — everything runs on your machine.
$ whisper-cli transcribe interview.mp3
▶ interview.mp3
✓ Saved → interview.txt
Preview
────────────────────────────────────────────────────
So today we're going to talk about the new product
launch. First, let me walk you through the timeline...
- Local transcription — Whisper runs 100% on your machine, no data sent anywhere
- Audio & video — MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, AVI, WebM, and more
- Multiple output formats — plain text, SRT subtitles, VTT subtitles, JSON, TSV
- Translation — transcribe + translate to English in one step (
--translate) - Language detection — auto-detects spoken language, or specify with
--language - Batch processing — transcribe multiple files with a single command
- Model selection — tiny (fastest) to large-v3 (best quality)
- Word timestamps — word-level timing for precise subtitle sync
- Python 3.10+
- ffmpeg (required for video files and most audio formats)
# macOS
brew install ffmpeg
# Ubuntu / Debian
apt install ffmpeg
# Windows
winget install ffmpegpip install whisper-cliOr from source:
git clone https://github.com/bhupendra05/whisper-cli.git
cd whisper-cli
pip install -e .The first time you run a command, Whisper will download the model (~74 MB for base). Subsequent runs use the cached model.
# Basic — outputs interview.txt
whisper-cli transcribe interview.mp3
# Specify model (tiny/base/small/medium/large)
whisper-cli transcribe lecture.mp4 --model small
# Output as SRT subtitles
whisper-cli transcribe video.mp4 --format srt
# Output as VTT (for HTML5 video)
whisper-cli transcribe video.mp4 --format vtt
# Save to a different directory
whisper-cli transcribe recording.wav --output-dir ./transcripts
# Specify language (skips auto-detection, slightly faster)
whisper-cli transcribe audio.mp3 --language fr# Transcribe Spanish audio and translate to English
whisper-cli transcribe spanish_interview.mp4 --translate
# Translate + SRT output
whisper-cli transcribe foreign_film.mkv --translate --format srt# Transcribe all MP3 files in a folder
whisper-cli transcribe *.mp3 --format srt --output-dir ./subtitles
# Mix of audio and video
whisper-cli transcribe meeting.mp4 call.m4a notes.wavwhisper-cli detect recording.mp3Detected language: es
Top 5 candidates
┌──────────┬──────────────────────────┐
│ Language │ Confidence│
├──────────┼──────────────────────────┤
│ es │ 94.2% ████████████████ │
│ pt │ 2.1% █ │
│ it │ 1.8% █ │
│ fr │ 0.9% │
│ ca │ 0.6% │
└──────────┴──────────────────────────┘
whisper-cli models┌───────────┬────────────┬────────┬───────┬──────────────┐
│ Model │ Parameters │ VRAM │ Speed │ English only │
├───────────┼────────────┼────────┼───────┼──────────────┤
│ tiny │ 39M │ ~1 GB │ ~32x │ │
│ tiny.en │ 39M │ ~1 GB │ ~32x │ ✓ │
│ base │ 74M │ ~1 GB │ ~16x │ │
│ base.en │ 74M │ ~1 GB │ ~16x │ ✓ │
│ small │ 244M │ ~2 GB │ ~6x │ │
│ medium │ 769M │ ~5 GB │ ~2x │ │
│ large-v3 │ 1550M │ ~10 GB │ 1x │ │
└───────────┴────────────┴────────┴───────┴──────────────┘
| Format | Extension | Use case |
|---|---|---|
txt |
.txt |
Plain text — reading, searching, summarizing |
srt |
.srt |
Subtitles for VLC, video editors |
vtt |
.vtt |
Subtitles for HTML5 <video> |
json |
.json |
Full segment data with timestamps for processing |
tsv |
.tsv |
Tab-separated — for spreadsheet analysis |
| Model | Quality | Speed | Best for |
|---|---|---|---|
tiny |
Basic | ~32× real-time | Quick drafts, clear speech |
base |
Good | ~16× real-time | Daily use, decent quality |
small |
Better | ~6× real-time | Accents, technical content |
medium |
High | ~2× real-time | Meetings, lectures |
large-v3 |
Best | Real-time | Production, multiple speakers |
All .en variants are faster and more accurate for English-only content.
whisper-cli/
├── whisper_cli/
│ ├── cli.py # Click CLI — transcribe, detect, models commands
│ └── transcribe.py # Core Whisper logic, format converters, language detection
├── requirements.txt
└── setup.py
MIT © bhupendra05