Local inference server for RIFT Transcription. Serves streaming speech recognition over WebSocket, backed by local models with automatic download.
pip install rift-local
rift-local supports multiple ASR backends, each installed as an optional extra:
pip install rift-local[sherpa] # sherpa-onnx (Nemotron, Kroko)
pip install rift-local[moonshine] # Moonshine Gen 2 (via moonshine-voice)
pip install rift-local[sherpa,moonshine] # bothOn Apple Silicon, add MLX support for future GPU-accelerated batch transcription:
pip install rift-local[mlx]For development (includes pytest):
pip install rift-local[dev]List all available models and see which are installed:
rift-local list
rift-local list --installed
| Model | Params | Languages | Download | Notes |
|---|---|---|---|---|
nemotron-en |
0.6B | EN | 447 MB | Best accuracy. |
zipformer-en-kroko |
~30M | EN | 55 MB | Lightweight, fast. Only ~68 MB on disk. |
Requires: pip install rift-local[sherpa]
| Model | Params | Languages | Size | Notes |
|---|---|---|---|---|
moonshine-en-tiny |
34M | EN | 26 MB | Fastest. Good for low-resource. |
moonshine-en-small |
123M | EN | 95 MB | Balanced speed/accuracy. |
moonshine-en-medium |
245M | EN | 190 MB | Default. Best Moonshine accuracy. |
Requires: pip install rift-local[moonshine]
Moonshine models are downloaded automatically by the moonshine-voice library on first use.
Start the WebSocket server with any model:
# Start server and open RIFT Transcription in your browser
rift-local serve --open
# Moonshine (default model)
rift-local serve
# sherpa-onnx
rift-local serve --model nemotron-en
# Custom host/port
rift-local serve --model moonshine-en-tiny --host 0.0.0.0 --port 8080The --open flag launches RIFT Transcription in your browser, pre-configured to connect to the local server. The voice source is set to "Local" automatically — just click to start the mic.
For local development of the RIFT Transcription client:
rift-local serve --open dev # opens http://localhost:5173
rift-local serve --open dev:3000 # custom portThe server auto-downloads the model on first run, then listens on:
- WebSocket:
ws://127.0.0.1:2177/ws(streaming ASR) - HTTP:
http://127.0.0.1:2177/info(model metadata)
| Flag | Default | Description |
|---|---|---|
--model |
moonshine-en-medium |
Model name from registry |
--host |
127.0.0.1 |
Bind address |
--port |
2177 |
Server port |
--threads |
2 |
Inference threads |
--open |
off | Open browser to RIFT Transcription client |
- Client connects to
/ws - Server sends
infoJSON (model name, features, sample rate) - Client sends binary frames of Float32 PCM audio at 16 kHz
- Server sends
resultJSON messages with partial/final transcriptions - Client sends text
"Done"to end the session
# Install dev + backend dependencies
pip install -e ".[dev,sherpa,moonshine]"
# Run fast tests (mocked backends, no model download)
pytest
# Run all tests including slow integration tests (downloads models)
pytest --slowTests are in the tests/ directory:
test_server.py— WebSocket server tests using a mock backendtest_moonshine.py— Moonshine adapter unit tests (mocked) + integration tests (slow)conftest.py— SharedMockBackendfixture and--slowflag
See specs/rift-local.md for the full design document.