whisper-cli

Transcribe audio and video files locally using OpenAI Whisper. No internet needed after model download — everything runs on your machine.

$ whisper-cli transcribe interview.mp3

▶ interview.mp3
  ✓ Saved → interview.txt

Preview
────────────────────────────────────────────────────
So today we're going to talk about the new product
launch. First, let me walk you through the timeline...

Features

Local transcription — Whisper runs 100% on your machine, no data sent anywhere
Audio & video — MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, AVI, WebM, and more
Multiple output formats — plain text, SRT subtitles, VTT subtitles, JSON, TSV
Translation — transcribe + translate to English in one step (--translate)
Language detection — auto-detects spoken language, or specify with --language
Batch processing — transcribe multiple files with a single command
Model selection — tiny (fastest) to large-v3 (best quality)
Word timestamps — word-level timing for precise subtitle sync

Requirements

Python 3.10+
ffmpeg (required for video files and most audio formats)

# macOS
brew install ffmpeg

# Ubuntu / Debian
apt install ffmpeg

# Windows
winget install ffmpeg

Installation

pip install whisper-cli

Or from source:

git clone https://github.com/bhupendra05/whisper-cli.git
cd whisper-cli
pip install -e .

The first time you run a command, Whisper will download the model (~74 MB for base). Subsequent runs use the cached model.

Usage

Transcribe a file

# Basic — outputs interview.txt
whisper-cli transcribe interview.mp3

# Specify model (tiny/base/small/medium/large)
whisper-cli transcribe lecture.mp4 --model small

# Output as SRT subtitles
whisper-cli transcribe video.mp4 --format srt

# Output as VTT (for HTML5 video)
whisper-cli transcribe video.mp4 --format vtt

# Save to a different directory
whisper-cli transcribe recording.wav --output-dir ./transcripts

# Specify language (skips auto-detection, slightly faster)
whisper-cli transcribe audio.mp3 --language fr

Translate to English

# Transcribe Spanish audio and translate to English
whisper-cli transcribe spanish_interview.mp4 --translate

# Translate + SRT output
whisper-cli transcribe foreign_film.mkv --translate --format srt

Batch processing

# Transcribe all MP3 files in a folder
whisper-cli transcribe *.mp3 --format srt --output-dir ./subtitles

# Mix of audio and video
whisper-cli transcribe meeting.mp4 call.m4a notes.wav

Detect spoken language

whisper-cli detect recording.mp3

Detected language: es

Top 5 candidates
┌──────────┬──────────────────────────┐
│ Language │                Confidence│
├──────────┼──────────────────────────┤
│ es       │  94.2%  ████████████████ │
│ pt       │   2.1%  █               │
│ it       │   1.8%  █               │
│ fr       │   0.9%                   │
│ ca       │   0.6%                   │
└──────────┴──────────────────────────┘

List available models

whisper-cli models

┌───────────┬────────────┬────────┬───────┬──────────────┐
│ Model     │ Parameters │   VRAM │ Speed │ English only │
├───────────┼────────────┼────────┼───────┼──────────────┤
│ tiny      │        39M │  ~1 GB │  ~32x │              │
│ tiny.en   │        39M │  ~1 GB │  ~32x │      ✓       │
│ base      │        74M │  ~1 GB │  ~16x │              │
│ base.en   │        74M │  ~1 GB │  ~16x │      ✓       │
│ small     │       244M │  ~2 GB │   ~6x │              │
│ medium    │       769M │  ~5 GB │   ~2x │              │
│ large-v3  │      1550M │ ~10 GB │    1x │              │
└───────────┴────────────┴────────┴───────┴──────────────┘

Output Formats

Format	Extension	Use case
`txt`	`.txt`	Plain text — reading, searching, summarizing
`srt`	`.srt`	Subtitles for VLC, video editors
`vtt`	`.vtt`	Subtitles for HTML5 `<video>`
`json`	`.json`	Full segment data with timestamps for processing
`tsv`	`.tsv`	Tab-separated — for spreadsheet analysis

Model Comparison

Model	Quality	Speed	Best for
`tiny`	Basic	~32× real-time	Quick drafts, clear speech
`base`	Good	~16× real-time	Daily use, decent quality
`small`	Better	~6× real-time	Accents, technical content
`medium`	High	~2× real-time	Meetings, lectures
`large-v3`	Best	Real-time	Production, multiple speakers

All .en variants are faster and more accurate for English-only content.

Project Structure

whisper-cli/
├── whisper_cli/
│   ├── cli.py          # Click CLI — transcribe, detect, models commands
│   └── transcribe.py   # Core Whisper logic, format converters, language detection
├── requirements.txt
└── setup.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
whisper_cli		whisper_cli
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper-cli

Features

Requirements

Installation

Usage

Transcribe a file

Translate to English

Batch processing

Detect spoken language

List available models

Output Formats

Model Comparison

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whisper-cli

Features

Requirements

Installation

Usage

Transcribe a file

Translate to English

Batch processing

Detect spoken language

List available models

Output Formats

Model Comparison

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages