Skip to content

bhupendra05/whisper-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whisper-cli

Transcribe audio and video files locally using OpenAI Whisper. No internet needed after model download — everything runs on your machine.

Python License Whisper

$ whisper-cli transcribe interview.mp3

▶ interview.mp3
  ✓ Saved → interview.txt

Preview
────────────────────────────────────────────────────
So today we're going to talk about the new product
launch. First, let me walk you through the timeline...

Features

  • Local transcription — Whisper runs 100% on your machine, no data sent anywhere
  • Audio & video — MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, AVI, WebM, and more
  • Multiple output formats — plain text, SRT subtitles, VTT subtitles, JSON, TSV
  • Translation — transcribe + translate to English in one step (--translate)
  • Language detection — auto-detects spoken language, or specify with --language
  • Batch processing — transcribe multiple files with a single command
  • Model selection — tiny (fastest) to large-v3 (best quality)
  • Word timestamps — word-level timing for precise subtitle sync

Requirements

  • Python 3.10+
  • ffmpeg (required for video files and most audio formats)
# macOS
brew install ffmpeg

# Ubuntu / Debian
apt install ffmpeg

# Windows
winget install ffmpeg

Installation

pip install whisper-cli

Or from source:

git clone https://github.com/bhupendra05/whisper-cli.git
cd whisper-cli
pip install -e .

The first time you run a command, Whisper will download the model (~74 MB for base). Subsequent runs use the cached model.


Usage

Transcribe a file

# Basic — outputs interview.txt
whisper-cli transcribe interview.mp3

# Specify model (tiny/base/small/medium/large)
whisper-cli transcribe lecture.mp4 --model small

# Output as SRT subtitles
whisper-cli transcribe video.mp4 --format srt

# Output as VTT (for HTML5 video)
whisper-cli transcribe video.mp4 --format vtt

# Save to a different directory
whisper-cli transcribe recording.wav --output-dir ./transcripts

# Specify language (skips auto-detection, slightly faster)
whisper-cli transcribe audio.mp3 --language fr

Translate to English

# Transcribe Spanish audio and translate to English
whisper-cli transcribe spanish_interview.mp4 --translate

# Translate + SRT output
whisper-cli transcribe foreign_film.mkv --translate --format srt

Batch processing

# Transcribe all MP3 files in a folder
whisper-cli transcribe *.mp3 --format srt --output-dir ./subtitles

# Mix of audio and video
whisper-cli transcribe meeting.mp4 call.m4a notes.wav

Detect spoken language

whisper-cli detect recording.mp3
Detected language: es

Top 5 candidates
┌──────────┬──────────────────────────┐
│ Language │                Confidence│
├──────────┼──────────────────────────┤
│ es       │  94.2%  ████████████████ │
│ pt       │   2.1%  █               │
│ it       │   1.8%  █               │
│ fr       │   0.9%                   │
│ ca       │   0.6%                   │
└──────────┴──────────────────────────┘

List available models

whisper-cli models
┌───────────┬────────────┬────────┬───────┬──────────────┐
│ Model     │ Parameters │   VRAM │ Speed │ English only │
├───────────┼────────────┼────────┼───────┼──────────────┤
│ tiny      │        39M │  ~1 GB │  ~32x │              │
│ tiny.en   │        39M │  ~1 GB │  ~32x │      ✓       │
│ base      │        74M │  ~1 GB │  ~16x │              │
│ base.en   │        74M │  ~1 GB │  ~16x │      ✓       │
│ small     │       244M │  ~2 GB │   ~6x │              │
│ medium    │       769M │  ~5 GB │   ~2x │              │
│ large-v3  │      1550M │ ~10 GB │    1x │              │
└───────────┴────────────┴────────┴───────┴──────────────┘

Output Formats

Format Extension Use case
txt .txt Plain text — reading, searching, summarizing
srt .srt Subtitles for VLC, video editors
vtt .vtt Subtitles for HTML5 <video>
json .json Full segment data with timestamps for processing
tsv .tsv Tab-separated — for spreadsheet analysis

Model Comparison

Model Quality Speed Best for
tiny Basic ~32× real-time Quick drafts, clear speech
base Good ~16× real-time Daily use, decent quality
small Better ~6× real-time Accents, technical content
medium High ~2× real-time Meetings, lectures
large-v3 Best Real-time Production, multiple speakers

All .en variants are faster and more accurate for English-only content.


Project Structure

whisper-cli/
├── whisper_cli/
│   ├── cli.py          # Click CLI — transcribe, detect, models commands
│   └── transcribe.py   # Core Whisper logic, format converters, language detection
├── requirements.txt
└── setup.py

License

MIT © bhupendra05

About

Transcribe audio & video files locally using OpenAI Whisper — no internet needed

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages