Skip to content

buddyh/agent-radio

Repository files navigation

Agent Radio

Turn your AI agents into a radio station.

Agent Radio serves a continuous live audio stream from your machine — silent by default — into which agent output (TTS'd replies, notifications, alerts) gets injected on demand. Subscribe from a phone with the screen locked and agent speech just arrives, announced by a soft chime.

Why a radio? iOS forbids background microphone access for web apps but happily plays a background audio stream forever. Instead of fighting the platform to make agents listen, this flips it around: agent output rides a stream the phone is already happy to play in your pocket, through earbuds or glasses, hands-free.

https://github.com/buddyh/agent-radio

What it does

  • Live stream — one continuous MP3 stream, silent by default, that a phone keeps playing locked/backgrounded.
  • POST /say — any script, agent, hook, or cron job puts spoken text on the air. That's the whole integration contract.
  • Studio line — a web UI to chat with an agent and hear the reply on air, with capability-detected backends.
  • Talk-back — record from the browser; it transcribes locally and routes to the selected agent.
  • Per-channel voices — set a different TTS voice for each source (macOS say or ElevenLabs), editable in a Settings panel.
  • Walkie-talkie mode — a squelch chirp on key-up and a roger beep after each transmission, instead of the plain chime.
  • Air log — everything that went out over the air, replayable per-device.
  • Chime — a soft ding precedes speech, but only after real silence, so a burst of messages never stacks chimes.

Requirements

  • macOS — the default voice is the built-in say command.
  • liquidsoap — the streaming engine (brew install liquidsoap).
  • ffmpeg — audio encoding (brew install ffmpeg).
  • Python 3 — the server is stdlib-only, no pip installs.
  • Optional: WhisperKit (brew install whisperkit-cli) for talk-back transcription.
  • Optional: an agent CLI on your PATH for the studio line — any of Claude Code (claude), Codex (codex), tmux, or OpenClaw (openclaw).

Run

./start.sh   # liquidsoap (:8500) + the web/API server (:8501)
./stop.sh
./check.sh   # gates: ruff, liquidsoap --check, bash -n, HTML parse

start.sh checks for the required tools and tells you what to brew install if anything's missing. Then open http://localhost:8501 — the tuner UI. Tap the dial to tune in. Logs land in logs/, TTS clips in clips/ (auto-pruned after 24h), spoken history in logs/history.jsonl.

Raw stream for VLC or another player: http://localhost:8500/stream.

Reach it from your phone

The server binds to loopback and is meant to be reached over a private network, never the public internet (there is no authentication — see Security). The simplest safe path:

  1. Put both devices on a Tailscale tailnet (or any VPN/LAN you trust).

  2. Point a reverse proxy on this machine at localhost:8501 and give it a hostname your phone can resolve. Caddy is a two-line config:

    radio.example.ts.net {
        reverse_proxy localhost:8501
    }
    
  3. Open that hostname on the phone. Use HTTPS if you want talk-back — the microphone API (getUserMedia) requires a secure context.

Playback continues with the phone locked. That's the point.

Put speech on the air

curl -X POST http://localhost:8501/say \
  -H 'Content-Type: application/json' \
  -d '{"text": "your build finished, and all tests passed"}'

Wire this into anything — a CI hook, a cron job, a notification script, an agent tool. A chime precedes speech only after 20+ seconds of quiet, so bursts don't stack.

The studio line (agent chat)

The UI chats with an agent and speaks the reply on air. Backends are capability-detected — a chip only appears if its CLI is on your PATH, so you see exactly what you have installed:

Backend How
claude headless claude -p --resume <sid> (default permissions)
codex codex exec / codex exec resume <thread>
tmux types into a running tmux session, captures the pane reply
openclaw openclaw agent --agent <id> ... (default agent main; override with AGENT_RADIO_OPENCLAW_AGENT)

To add your own agent, drop an entry in the BACKENDS registry in radio_server.py: a label, a detect predicate, and a run(convo, target, text) -> reply function.

POST /agent returns a job id; poll GET /jobs/<id>. Replies come back to the page and (toggle, default on) get spoken on the stream.

Talk-back

The mic button records while the stream keeps playing (the page must be open — iOS only blocks the mic for backgrounded tabs), transcribes locally with WhisperKit, and routes the transcript to the selected backend:

POST /talk {"audio_base64", "mime_type", "backend", "target"?, "convo"?, "speak"?}
  -> {"transcript": "...", "job": "..."}

For a hands-free push-to-talk from an iPhone, see WALKIE-SHORTCUT.md — an Action Button shortcut that dictates, POSTs to /agent, and lets the reply come back over the radio.

Voices and custom TTS providers

The default voice is macOS say (offline, free, instant). Voices are set per channel — a channel is a speech source: an agent backend name (claude/codex/tmux/openclaw) or say for the /say endpoint. Use the gear icon in the UI to set each channel's provider and voice live, or copy config.example.json to config.local.json (gitignored) and edit the tts block:

{
  "tts": {
    "default": { "provider": "say" },
    "providers": {
      "elevenlabs": {
        "voice_id": "your-voice-id",
        "model_id": "eleven_turbo_v2_5",
        "api_key_env": "ELEVENLABS_API_KEY"
      }
    },
    "channels": {
      "openclaw": { "provider": "elevenlabs" }
    }
  }
}
  • Per channel: a source uses its channels entry if present, otherwise default. The example gives openclaw an ElevenLabs voice and leaves everything else on say. A channel can override provider settings (e.g. a different voice_id) under channels.<source>.<provider>.
  • API keys never live in the config. A provider reads its key from the environment variable named by api_key_env. Put the export in .env.local (gitignored) — start.sh sources it — or set it in your shell.
  • Any provider failure falls back to say, so the radio never goes mute; the error is logged.
  • The Settings panel saves via POST /config, which persists to config.local.json and hot-reloads — no restart. API keys are never written from the UI.

Add your own provider in three steps, no framework: write a your_provider(text, opts) -> mp3_path function in radio_server.py, register it in the TTS_PROVIDERS dict, and reference it by name from a channel. opts is the merged provider + per-channel settings.

Security

  • No authentication on any endpoint. Agent Radio is built for a private network you control. The server binds 127.0.0.1; a reverse proxy on a tailnet/VPN/trusted-LAN is the intended and only exposure. Do not put it on the public internet.
  • Anything you POST /say is spoken out loud wherever the listener is. Don't inject secrets.
  • The studio-line agents run with their own default permissions; the bundled claude backend does not skip permission prompts.

How it fits together

web UI + API                     (radio_server.py, 127.0.0.1:8501)
  GET  /  /stream  /backends  /sessions  /recent  /clips/:name  /mode  /config  /jobs/:id
  POST /say  /agent  /talk  /mode  /config
      |  text -> TTS provider -> ffmpeg -> clips/*.mp3   (chime after 20s+ of quiet)
      v
liquidsoap request.queue  ------ telnet 127.0.0.1:8502 "say.push <clip>"
      |  fallback: silence
      v
output.harbor MP3 stream         127.0.0.1:8500/stream (proxied at :8501/stream)
      |
      v
your reverse proxy               reachable on your private network

License

MIT — see LICENSE.

About

Turn your AI agents into a radio station — a background audio stream you tune into from a locked phone; agent speech gets TTS'd in on demand.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors