tts audio

TTS Audio

Cross can convert any research report into a spoken MP3 file. This is optional — all report generation, fact-checking, and benchmark commands work without it. TTS adds three commands: st-speak, st-voice, and st-prep --mp3.

Quick start

# 1. Install Cross with audio support
pipx install "cross-st[tts]"

# 2. Start a local Piper TTS server (Docker)
docker run -d --name wyoming-piper -p 10200:10200 \
  -v ~/piper-voices:/data \
  rhasspy/wyoming-piper --voice en_US-lessac-medium

# 3. Tell Cross where the server is (once, persists forever)
st-admin --set-tts-voice en_US-lessac-medium
# TTS_HOST and TTS_PORT default to localhost:10200

# 4. Render a story to MP3
st-speak my_topic.json    # → my_topic.mp3

Python version support

The table below reflects live install and import tests on macOS ARM, 2026-03-31 (see tests/test_tts_stack.py to reproduce).

Python	No-TTS	With TTS	numpy resolved
3.9	❌	❌	numpy 2.2+ requires 3.10 — fails at install
3.10	✅	✅	2.2.6 (numpy 2.3+ raised its floor to 3.11)
3.11	✅	✅	2.2.4 (pinned in requirements.txt)
3.12	✅	✅	2.4.4 — 18/18 packages pass import test
3.13	✅	✅	2.4.4 — 18/18 packages pass import test

The minimum of 3.10 is set by numpy 2.x, scipy 1.15.x, and match/case syntax in st-plot.py and st-voice.py. There is no ceiling.

numpy note: numpy 2.3.x raised Requires-Python to >=3.11. On Python 3.10, pip automatically resolves to numpy 2.2.x — no manual pinning needed. All numpy 2.x branches work with Cross.

Install by platform

macOS (Apple Silicon or Intel)

pipx install "cross-st[tts]"

soundfile bundles its own libsndfile binary on macOS — no Homebrew package needed. Any Python 3.10–3.13 works; use --python to pick a specific version:

brew install python@3.12
pipx install --python python3.12 "cross-st[tts]"

Linux — Debian / Ubuntu

# System audio library (required on Linux — not bundled in the soundfile wheel)
sudo apt install libsndfile1 ffmpeg

# Install Cross with TTS
pipx install "cross-st[tts]"

For in-terminal voice playback in st-voice (the s key), install a player — afplay is macOS-only:

sudo apt install mpv
echo "AUDIO_PLAYER=mpv" >> ~/.crossenv

Linux — Fedora / RHEL

sudo dnf install libsndfile ffmpeg
pipx install "cross-st[tts]"

Linux — Arch

sudo pacman -S libsndfile ffmpeg
pipx install "cross-st[tts]"

Windows

Native Windows is not supported for TTS (soundfile has no Windows wheel and Cross uses POSIX APIs for keyboard input). Use WSL2 instead:

# PowerShell — installs WSL2 with Ubuntu
wsl --install

Then follow the Ubuntu instructions above inside the WSL2 terminal.

Piper TTS server

Cross does not bundle a speech engine. It connects to a locally-running Wyoming Piper server via TCP. You need to start this server once before using any TTS command.

Docker (recommended)

docker run -d \
  --name wyoming-piper \
  --restart unless-stopped \
  -p 10200:10200 \
  -v ~/piper-voices:/data \
  rhasspy/wyoming-piper \
  --voice en_US-lessac-medium

The ~/piper-voices directory is where ONNX model files are stored.

Native (no Docker)

pipx install wyoming-piper
wyoming-piper --voice en_US-lessac-medium --uri tcp://0.0.0.0:10200

See wyoming-piper on GitHub for full server options.

Configure Cross

Add to ~/.crossenv:

TTS_HOST=localhost
TTS_PORT=10200
TTS_VOICE=en_US-lessac-medium

Or set via st-admin:

st-admin --set-tts-voice en_US-lessac-medium

TTS_HOST and TTS_PORT default to localhost and 10200 when not set.

Voice management

Browse voices

st-voice --voices          # list all available en_US / en_GB voice names

Download voice models

Voice ONNX files (~30–130 MB each) are fetched from Hugging Face:

st-voice --curl | bash     # download all voices
st-voice --curl | grep "lessac" | bash   # download one specific voice

Store model files in the directory your Piper server watches (~/piper-voices in the Docker example above).

Audition voices

st-voice sample.txt        # interactive shell

Keys: v next voice · s speak · e edit · q quit

Recommended starting voices

Voice	Quality	Style
`en_US-lessac-medium`	Good	Neutral, clear
`en_US-lessac-high`	High	Same speaker, higher fidelity
`en_US-ryan-high`	High	Male, expressive
`en_US-libritts-high`	High	Natural prosody

TTS commands reference

Command	Output
`st-speak my_topic.json`	`my_topic.mp3` from story 1
`st-speak -s 3 my_topic.json`	MP3 from story 3
`st-speak --source fact my_topic.json`	Reads fact-check report aloud
`st-speak --voice en_US-ryan-high my_topic.json`	One-off voice override
`st-prep my_topic.json --mp3`	Process text and render MP3
`st-prep my_topic.json --all`	Export md + mp3 + txt + title

Without TTS

All commands except st-speak, st-voice, and st-prep --mp3/--all work without TTS packages.

pipx install cross-st          # no TTS extras

Running a TTS command without the packages prints a clear message and exits:

Error: st-speak requires TTS packages.
Run: pip install "cross-st[tts]"  or  pipx install "cross-st[tts]"

Troubleshooting

TTS host localhost:10200 is offline
The Piper server is not running. Start it (see above), then test:

nc -z localhost 10200 && echo "up" || echo "down"

ImportError / soundfile not found
TTS packages not installed: pip install "cross-st[tts]"

libsndfile error on Linux
sudo apt install libsndfile1 (Debian/Ubuntu) — the Linux soundfile wheel does not bundle libsndfile the way the macOS wheel does.

No audio playback on Linux
afplay is macOS-only. Install mpv and add AUDIO_PLAYER=mpv to ~/.crossenv.

Voice model not found
Download the ONNX file: st-voice --curl | grep "your-voice" | bash

tts audio

TTS Audio

Quick start

Python version support

Install by platform

macOS (Apple Silicon or Intel)

Linux — Debian / Ubuntu

Linux — Fedora / RHEL

Linux — Arch

Windows

Piper TTS server

Docker (recommended)

Native (no Docker)

Configure Cross

Voice management

Browse voices

Download voice models

Audition voices

Recommended starting voices

TTS commands reference

Without TTS

Troubleshooting

Further reading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally