Local speech-to-text API using FluidAudio's Parakeet CoreML models. Achieves ~100-150x realtime transcription on Apple Silicon via the Neural Engine.
- macOS 14.0+ (Sonoma)
- Apple Silicon (M1/M2/M3/M4)
- Xcode Command Line Tools (
xcode-select --install) - Python 3.11+ with uv
./scripts/build-transcribe.shThis clones FluidAudio, adds the TranscribeCLI wrapper, and builds a release binary to bin/transcribe. The script is idempotent.
The first transcription downloads ~1-2GB of CoreML models from HuggingFace. Do this once manually to avoid a slow first request:
./bin/transcribe data/test-2.wavuv syncuv run uvicorn server:app --host 0.0.0.0 --port 8765 --workers 2curl http://localhost:8765/healthcurl -X POST http://localhost:8765/transcribe -F "file=@data/test-1.wav"
curl -X POST http://localhost:8765/transcribe -F "file=@data/test-2.wav"Response:
{
"text": "Full transcription text",
"segments": [
{ "start": 0.0, "end": 2.5, "text": "First sentence." },
{ "start": 2.5, "end": 5.0, "text": "Second sentence." }
],
"confidence": 0.98,
"rtfx": 155.0,
"processing_time": 0.08
}After pulling changes:
# If pyproject.toml changed
uv sync
# If scripts/TranscribeCLI.swift changed
./scripts/build-transcribe.sh
# Restart to pick up changes
./scripts/service.sh restartFor most changes (server.py), just restart is enough.
./scripts/service.sh install # Generate plist, install, and start
./scripts/service.sh status # Check if running + health check
./scripts/service.sh stop
./scripts/service.sh start
./scripts/service.sh restart
./scripts/service.sh uninstall # Stop and remove| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/transcribe |
POST | Transcribe uploaded audio (wav, mp3, m4a, flac) |
/docs |
GET | Swagger UI |
HTTP Request → FastAPI (server.py) → subprocess → bin/transcribe → CoreML Neural Engine
The server shells out to the Swift binary for each transcription. The binary loads FluidAudio's Parakeet TDT v3 model and runs inference on the Apple Neural Engine.