whisper(ml)x

Fast, accurate speech recognition on Apple Silicon — powered by MLX.

A fork of WhisperX with the inference backend replaced by mlx-whisper, running natively on Apple Silicon via MLX. Word-level timestamps, speaker diarization, and VAD are all retained.

⚡️ MLX inference — runs on Apple Silicon GPU via unified memory
🎯 Word-level timestamps via wav2vec2 forced alignment
👥 Speaker diarization via pyannote-audio
🗣️ VAD preprocessing via pyannote or silero

Installation

pip install whispermlx

Or with uv:

uv add whispermlx

Usage

CLI

# Auto-downloads mlx-community/whisper-large-v3-mlx on first run
whispermlx audio.mp3 --model large-v3

# With speaker diarization
whispermlx audio.mp3 --model large-v3 --diarize --hf_token YOUR_TOKEN

# Use any mlx-community model directly
whispermlx audio.mp3 --model mlx-community/whisper-large-v3-turbo

Python

import whispermlx

# Short name — auto-maps to mlx-community/whisper-large-v3-mlx
model = whispermlx.load_model("large-v3", device="cpu")
result = model.transcribe("audio.mp3")
print(result["segments"])

# With alignment
model_a, metadata = whispermlx.load_align_model(language_code=result["language"], device="cpu")
result = whispermlx.align(result["segments"], model_a, metadata, "audio.mp3", device="cpu")

# With diarization
from whispermlx.diarize import DiarizationPipeline
diarize_model = DiarizationPipeline(token="YOUR_HF_TOKEN", device="cpu")
diarize_segments = diarize_model("audio.mp3")
result = whispermlx.assign_word_speakers(diarize_segments, result)

Model Names

Short names are automatically mapped to their mlx-community equivalents. Full HF repo IDs also work.

Short name	HF repo
`tiny`, `base`, `small`, `medium`	`mlx-community/whisper-{name}-mlx`
`large-v3`	`mlx-community/whisper-large-v3-mlx`
`large-v3-turbo` / `turbo`	`mlx-community/whisper-large-v3-turbo`

Speaker Diarization

Requires a Hugging Face access token and acceptance of the pyannote speaker-diarization-community-1 model agreement.

Acknowledgements

Built on top of WhisperX by Max Bain et al., mlx-whisper, pyannote-audio, and OpenAI Whisper.

@article{bain2022whisperx,
  title={WhisperX: Time-Accurate Speech Transcription of Long-Form Audio},
  author={Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew},
  journal={INTERSPEECH 2023},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 556 Commits
.github/workflows		.github/workflows
whispermlx		whispermlx
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper(ml)x

Installation

Usage

CLI

Python

Model Names

Speaker Diarization

Acknowledgements

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whisper(ml)x

Installation

Usage

CLI

Python

Model Names

Speaker Diarization

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages