Parakeet ASR Server

Local speech-to-text API using FluidAudio's Parakeet CoreML models. Achieves ~100-150x realtime transcription on Apple Silicon via the Neural Engine.

Requirements

macOS 14.0+ (Sonoma)
Apple Silicon (M1/M2/M3/M4)
Xcode Command Line Tools (xcode-select --install)
Python 3.11+ with uv

Setup

1. Build the transcribe binary

./scripts/build-transcribe.sh

This clones FluidAudio, adds the TranscribeCLI wrapper, and builds a release binary to bin/transcribe. The script is idempotent.

2. Download models (first run only)

The first transcription downloads ~1-2GB of CoreML models from HuggingFace. Do this once manually to avoid a slow first request:

./bin/transcribe data/test-2.wav

3. Install Python dependencies

uv sync

4. Run the server

uv run uvicorn server:app --host 0.0.0.0 --port 8765 --workers 2

Usage

Health check

curl http://localhost:8765/health

Transcribe audio

curl -X POST http://localhost:8765/transcribe -F "file=@data/test-1.wav"
curl -X POST http://localhost:8765/transcribe -F "file=@data/test-2.wav"

Response:

{
  "text": "Full transcription text",
  "segments": [
    { "start": 0.0, "end": 2.5, "text": "First sentence." },
    { "start": 2.5, "end": 5.0, "text": "Second sentence." }
  ],
  "confidence": 0.98,
  "rtfx": 155.0,
  "processing_time": 0.08
}

Redeploying

After pulling changes:

# If pyproject.toml changed
uv sync

# If scripts/TranscribeCLI.swift changed
./scripts/build-transcribe.sh

# Restart to pick up changes
./scripts/service.sh restart

For most changes (server.py), just restart is enough.

Running as a launchd service

./scripts/service.sh install   # Generate plist, install, and start
./scripts/service.sh status    # Check if running + health check
./scripts/service.sh stop
./scripts/service.sh start
./scripts/service.sh restart
./scripts/service.sh uninstall # Stop and remove

API

Endpoint	Method	Description
`/health`	GET	Health check
`/transcribe`	POST	Transcribe uploaded audio (wav, mp3, m4a, flac)
`/docs`	GET	Swagger UI

Architecture

HTTP Request → FastAPI (server.py) → subprocess → bin/transcribe → CoreML Neural Engine

The server shells out to the Swift binary for each transcription. The binary loads FluidAudio's Parakeet TDT v3 model and runs inference on the Apple Neural Engine.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
data		data
notes		notes
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parakeet ASR Server

Requirements

Setup

1. Build the transcribe binary

2. Download models (first run only)

3. Install Python dependencies

4. Run the server

Usage

Health check

Transcribe audio

Redeploying

Running as a launchd service

API

Architecture

About

Uh oh!

Releases

Packages

Languages

MattSegal/parakeet-server

Folders and files

Latest commit

History

Repository files navigation

Parakeet ASR Server

Requirements

Setup

1. Build the transcribe binary

2. Download models (first run only)

3. Install Python dependencies

4. Run the server

Usage

Health check

Transcribe audio

Redeploying

Running as a launchd service

API

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages