Halo

Voice front-end for agentic coding tools. Say a wake word, talk to Claude Code or Codex CLI in plain English, and hear the result back. Halo is the audio layer; the agents do the work.

Status: v1.2.1 — hybrid streaming + one-sentence reply summaries. Long agent replies no longer monologue at you — Halo speaks the first 2 sentences live so you know the agent is working, then summarizes the rest into one sentence via the local brain. Full text still prints to the terminal. Builds on v1.2 (multi-session mode), v1.1.1 (follow-up gate), v1.1 (custom "halo" wake word, persistent Claude sessions, separate-console live feed, noise-suppression pipeline). Live web dashboard at http://127.0.0.1:7070.

See CHANGELOG.md for the full history. Licensed MIT (LICENSE).

The dashboard shows the entire pipeline live: which stage is firing (wake / record / transcribe / route / agent / voice), the most recent transcript, every agent session by Roman name with state and elapsed time, and a color-coded event log. Auto-launches in your default browser when you run Halo (set HALO_NO_BROWSER=1 to suppress).

HALO •  LOCAL  •  listening                            live  ·  v1.1
─────────────────────────────────────────────────────────────────────
[wake] → [record] → [transcribe] → [route] → [agent] → [voice]
─────────────────────────────────────────────────────────────────────
  TRANSCRIPT                            │  AGENTS         1 running
                                        │  Mercury · claude  RUNNING 23s
   "Claude, build me a one-line         │    "build me a hello..."
    python script that prints hello"    │
                                        │  Neptune · codex   IDLE
   ● Mercury — I wrote it to hello.py.  │
─────────────────────────────────────────────────────────────────────
  EVENT LOG                                              ● LIVE
  14:23:01  wake.fired       score 0.83
  14:23:03  record.opened
  14:23:04  stt.done         620ms  "Claude, build me a..."
  14:23:04  route.matched    vocative -> claude_code
  14:23:04  agent.dispatched Mercury starting
  14:23:12  agent.streaming  Mercury: I wrote it to hello.py.
  14:23:13  tts.spoke        Halo: I wrote it to hello.py.
  14:23:13  agent.done       Mercury  (8s)

You: "halo. Claude, write a one-line python script that prints hello."
Halo: "On it. I'm calling this session Mercury."
        ... a separate PowerShell window pops up live-tailing Claude
            (text deltas, tool calls, result events) so you can watch
            Claude work without scrolling Halo's own terminal
        ... Mercury speaks: "I wrote it to hello dot py."

You: "now make it print goodbye instead."
        ... same Mercury session (persistent process, no spawn cost)
        ... gate: continuation marker matched, sent
Halo: "Done."

[phone rings]
You: "hey John, want to grab lunch?"
        ... gate: side_conversation pattern matched, dropped silently
        ... dashboard logs it; Mercury never sees it

You: "ask Codex to refactor it into a function."
        ... verbal-dispatch regex catches this, skips Stage 2 LLM
Halo: "On it. I'm calling this session Neptune."

You: "back to halo. open hello.py."
Halo: "Opened hello.py."

You: "end session."
Halo: "Goodbye."

100% local out of the box (Ollama + faster-whisper + Kokoro). No API keys required to run Halo — the agents themselves (Claude Code / Codex) authenticate against their own services.

Stack

Layer	Tech	Notes
Wake word	openWakeWord + custom `halo.onnx` (trained via bbarrick/wakeword_trainer)	Single-word "halo" wake (188 positives × 22 voices + 524 negatives via ElevenLabs TTS, F1 = 0.80). silero-VAD gate on top via `vad_threshold=0.7`. Falls back to builtin `hey_jarvis` if `halo.onnx` is missing.
Mic	sounddevice	16 kHz mono int16
VAD	silero-vad v5	600 ms base silence, mode-adaptive extensions
STT	faster-whisper + distil-large-v3	int8_float16 on CUDA, ~500 ms per utterance
Router	Ollama + `qwen2.5:1.5b-instruct`	Stage 2 LLM, fires only when no local handler matches
TTS	Kokoro-82M ONNX (fp16)	`af_heart` voice, sanitized for Markdown
Agents	Claude Code + Codex CLI subprocesses	Persistent sessions; voice-mode flags (`--permission-mode bypassPermissions` / `approval_policy="never"`) so no manual approvals

Hardware target: Windows + NVIDIA GPU (RTX 3060 is enough). Mac / Linux paths exist for tools and TTS; CUDA-specific bits gracefully fall back to CPU.

v1.2 highlights

Hybrid streaming + one-sentence reply summaries (v1.2.1) — Streaming agents (Claude) speak the first 2 sentences live as they're generated, so you immediately know the agent is alive and working. After that, Halo stops speaking and silently buffers the rest. When the agent finishes, if the reply ran past the live budget Halo sends the remainder through the brain (qwen2.5:1.5b) for a one-sentence cap: "And, I added bcrypt password verification to auth dot py and wired it into login user." For batch agents (Codex), same idea — speak short replies as-is, summarize when long (> 400 chars). Tunable via LIVE_STREAM_MAX_SENTENCES and REPLY_SUMMARIZE_THRESHOLD_CHARS in halo/config.py. The full reply still prints to your terminal so nothing's lost — TTS is just the summary.
Multi-session mode (v1.2.0) — Halo runs a process discovery scanner that finds every claude / codex session on your machine and feeds the list to the routing brain. You no longer talk to one Claude bound to Halo's launch dir — you talk to whichever session you name. Voice commands:
- "switch to website" / "work on the AIP one" → set active session
- "what sessions do I have?" → spoken list
- "where am I?" / "what project am I on?" → speak the active one
- "in website, ask Claude to add dark mode" → one-shot dispatch to another session without changing the active one
- "tell all of them to run their tests" → fanout to every session Each project gets its own persistent Claude subprocess (LRU-bounded), so the same agent can have parallel jobs across N projects at once. Brain decisions live in the Stage 2 LLM prompt — no new regex. See Multi-session mode below.
Follow-up gate (v1.1.1) — once you're in direct dialogue with an agent, the mic stays hot for follow-ups but a 4-rule keyword filter decides whether each utterance is actually for the agent. Phone calls, side conversations to colleagues, and your own muttering get silently dropped instead of being dispatched to Claude. See Follow-up gate below for the rules. Toggle with FOLLOWUP_GATE_ENABLED in halo/config.py.
Custom "halo" wake word — single-word wake ("halo" alone fires it, no "hey" prefix needed). Trained on RTX 3060 in ~30 min via bbarrick's wakeword_trainer + ElevenLabs free-tier TTS.
Persistent Claude sessions — one long-lived claude -p --input-format stream-json process per session, kept alive across every voice turn. Kills the per-turn-subprocess race that caused exit None failures and "starting session" duplicates.
Separate-console live feed — when you dispatch to Claude, a PowerShell window pops up tailing the session log. You watch Claude's text deltas, tool calls, and result events in real time, while Halo's own terminal stays clean. Toggle with AGENT_CLI_VISIBLE = False in halo/config.py.
Noise suppression pipeline — three layers: mic noise gate (MIC_NOISE_GATE_RMS = 0.005), wake silero-VAD gate (vad_threshold = 0.7), Whisper internal VAD filter (vad_filter=True). Plus a residual STT hallucination filter for Whisper's "Thank you" / "you" / "hello" artifacts on silence.
Fuzzy-fallback dispatcher — when Ollama's down AND you said an agent name (even Whisper-mangled as "Cloud" / "Codec"), dispatches to that agent directly instead of giving up.
halo doctor — diagnose Ollama, agent CLIs, Kokoro models.

Install

1. Halo itself

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install git+https://github.com/VW3st/halo.git

(PyPI release coming — pip install halo-voice will work once published.)

On a Windows machine with an NVIDIA GPU, also grab the CUDA wheels for faster-whisper:

pip install "halo-voice[gpu-windows] @ git+https://github.com/VW3st/halo.git"

silero-vad pulls PyTorch (~800 MB). To save bandwidth on a CPU-only laptop, install Torch CPU-only first:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install git+https://github.com/VW3st/halo.git

2. Kokoro TTS model files (~200 MB)

halo download-models

Drops kokoro-v1.0.fp16.onnx + voices-v1.0.bin into ~/.halo/models/ (or ./models/ if you're running from a git checkout). Without these, Halo runs in silent (text-only) mode.

The other models download themselves on first use:

openWakeWord pretrained ONNX (~few MB)
silero-vad model (~2 MB)
faster-whisper distil-large-v3 (~1.5 GB, from HuggingFace)

3. Ollama + routing model

Install Ollama from https://ollama.com/download, then:

ollama pull qwen2.5:1.5b-instruct

Ollama auto-starts a background service on localhost:11434. Halo talks to it over HTTP.

Why qwen2.5:1.5b-instruct? Non-reasoning, ~1.5 GB, hits sub-100 ms on a warm KV cache. Reasoning models (qwen3, deepseek-r1) emit <think> tokens and blow the per-turn budget to 20 s+. Optional alternatives via OLLAMA_MODEL in halo/config.py: qwen2.5:3b-instruct (better accuracy, ~3× slower), qwen2.5:7b-instruct (best accuracy, needs more VRAM).

4. Coding agents (Claude Code and/or Codex CLI)

Install whichever you want Halo to dispatch to. At least one is needed unless you only plan to use local tools.

# Claude Code (requires Anthropic auth — `claude login` once)
npm install -g @anthropic-ai/claude-code

# Codex CLI (requires OpenAI auth — `codex login` once)
npm install -g @openai/codex

Both must be on PATH. Verify with claude --version and codex --version.

Run

halo

(or python -m halo if you cloned the repo directly.)

You'll see preload messages, then Halo ready. Say the wake word... and hear "Halo online." Speak the wake word ("Hey Jarvis"), then your command.

halo always uses the current working directory as Claude/Codex's project root, so cd into your project before launching. Override the model location with HALO_MODELS_DIR=/path/to/models.

CLI

halo                  start the voice loop (default)
halo run              same as above
halo download-models  fetch Kokoro TTS model files
halo doctor           check Ollama, agents (claude/codex), models — read-only
halo sessions         list running coding-agent sessions on this machine (v1.2)
halo version          print installed version
halo --help           show help

halo doctor is the first thing to run after install. It probes every external dependency (Ollama running + routing model pulled, Claude CLI on PATH and responsive, Codex CLI on PATH and responsive, Kokoro model files present) and prints the exact install command for anything that's missing. It never installs or modifies anything itself.

Dev install

Clone the repo and install editable:

git clone https://github.com/VW3st/halo.git
cd halo
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[gpu-windows,dev]
python -m halo

How conversations work

Once a wake word fires, you enter a conversation — you don't have to re-say "Hey Jarvis" between turns. Each thing you say is matched against this priority list, first match wins:

1.  End phrase             "over and out" / "goodbye" / "go to sleep"      -> exit conversation
2.  New session            "new task" / "start over" / "forget that"       -> reset agent sessions
3.  Vocative dispatch      "Claude, build X" / "Codex, run Y"              -> direct to that agent (no LLM)
3b. Verbal dispatch        "ask Codex to ..." / "tell Claude to ..."       -> direct to that agent (no LLM)
4.  Pure mode switch       "switch to codex" / "talk to claude"            -> set direct-dialogue agent
5.  Back to Halo           "back to halo" / "talk to me"                   -> exit direct-dialogue mode
6.  Status query           "what's happening" / "are you done"             -> read job registry
7.  Replay last result     "what did Claude say" / "repeat what Codex..."  -> re-speak last response
8.  Local tool             "open chrome" / "launch calculator"             -> system app handlers
9.  Direct-dialogue active sends utterance straight to current agent       -> --continue
10. Stage 2 LLM (fallback) only fires when nothing above matched           -> qwen2.5 routing

After 5 s of true idleness (no jobs running, no speech), Halo goes back to wake-listening.

Vocative & verbal dispatch — the killer features

Naming the agent up front skips the router LLM entirely. Two forms work:

Vocative (comma required — faster-whisper adds it reliably):

"Claude, build a login page with Supabase"
"Codex, run the tests and fix anything red"
"Claude code, summarize the README"
"Hey Codex, list the open issues"

Verbal (verb + agent + instruction):

"ask Codex to build a landing page"
"tell Claude to refactor auth.py"
"have Codex deploy the latest build"
"get Claude to write tests for this"
"use Codex for the bug in routes.py"
"let Claude handle the migration"

Both bypass the Stage 2 LLM, which means no agent-name swap or hallucinated extras (a real failure mode on the 1.5B router model).

After the first dispatch, you're automatically in direct-dialogue mode with that agent — every follow-up goes straight to its --continue, no LLM round-trip:

You:   "Claude, build a hello world script."
Halo:  "On it. I'm calling this session Mercury."
       ... Mercury responds
You:   "now add a docstring."     (direct -> Mercury)
You:   "also make it take a name." (direct -> Mercury)

To switch agents mid-flow:

"switch to codex" / "talk to codex"        -> direct dialogue with Codex
"transfer me to codex" / "transfer to ..."  -> same as switch
"Codex, ..."                                -> dispatches to Codex AND switches
"back to halo" / "talk to me"               -> exit direct dialogue
"transfer me back to halo"                  -> same

Once you're in direct dialogue with an agent, Halo never auto-sleeps — the conversation stays open until you explicitly say "goodbye", "go to sleep", "over and out", or any of the end phrases. The 5-second idle sleep only applies when you're in plain Halo mode (no direct dialogue and no running jobs).

Multi-session mode

By default, Halo scans your machine every 2 seconds for running coding-agent CLIs (claude and codex, including their npm shims), records each one's cwd and parent terminal, and exposes the list to the Stage 2 LLM brain as discovered sessions. The brain then routes voice commands accordingly — no manual halo project add step, no focus-binding to set up, no keystroke-injection trickery.

What you can say

Spoken	What happens
"halo, what sessions do I have?"	Brain emits `session_action: list_sessions` → Halo speaks: "three sessions: halo, website, AIP. You're on halo."
"halo, switch to website" / "work on the AIP one"	Brain emits `session_action: switch` → active session changes; next dispatch goes there
"halo, where am I?"	Brain emits `session_action: where_am_i` → speaks active session label + cwd
"halo, Claude, refactor the auth module"	Vocative dispatch (existing) into the active session's cwd
"halo, in website, ask Claude to add dark mode"	One-shot dispatch into the website session without changing active
"halo, tell all of them to run their tests"	Fanout — same prompt to every discovered session

How collisions are resolved

Two cwds with the same basename (D:\client-a\halo and D:\client-b\halo) get parent-disambiguated labels: client-a/halo and client-b/halo. Beyond that, pid suffix is appended (client-a/halo#1234). You can always say "what sessions do I have" to hear the current labels.

Per-project persistent sessions

When you dispatch to a discovered session, Halo lazily spawns its own persistent Claude subprocess in that cwd. This is keyed by (agent_key, cwd), so:

the same agent can have parallel jobs across N projects
each project's --continue thread stays alive for the Halo process lifetime
reset_session accepts an optional cwd to scope the reset

Verifying discovery

halo sessions

Read-only one-shot scan. Prints a table of label / agent / pid / cwd for every running coding-agent CLI. Exit 0 when something is found, 1 when nothing is. Use this before booting the full voice loop to confirm discovery works on your machine.

Single-session fallback

When psutil isn't installed OR no sessions are discovered, Halo runs in v1.1 single-session mode — spawning its own Claude in the launch cwd, exactly as before. The brain prompt only carries CURRENT CONTEXT when there's something to report, so single-session callers pay zero token overhead.

Brain-side routing (not regex)

The Stage 2 LLM (qwen2.5:1.5b by default) receives a CURRENT CONTEXT block prepended to every transcript when sessions are discovered:

# CURRENT CONTEXT (you decide target_session from this)
active_session: halo
discovered_sessions:
  - label: halo       agent: claude_code  cwd: D:/Halo
  - label: website    agent: claude_code  cwd: D:/website-redesign
  - label: aip        agent: claude_code  cwd: D:/AIP-Claude

The JSON schema gained two fields the brain populates:

target_session — label / "active" / "focused" / "all" / ""
session_action — "" / "switch" / "list_sessions" / "where_am_i"

This means new routing patterns ("work on the third one", "the AI thing I was looking at", "the website one with the dark theme") are all handled by upgrading the prompt — not by adding regex.

Follow-up gate

Once you've dispatched to an agent (vocative, verbal, or pure switch), Halo enters direct-dialogue mode — the mic stays hot so follow-ups ("now also add tests") don't need to re-state the agent name.

Without a filter, that hot mic would dispatch everything it captures to the agent: the phone call you take 10 seconds after the dispatch, the colleague who walks over, even you muttering at your own screen. All of that audio would get transcribed and sent to Claude as if it were a command.

The follow-up gate (halo/followup_gate.py) runs every direct-mode transcript through 4 rules before dispatch. First match wins → SEND. If none match → drop silently and log to the dashboard.

Rule	Fires when…	Example that fires it
1. Agent name	The transcript mentions `claude`, `codex`, or any per-agent `fuzzy_triggers` (e.g. `cloud`, `clawed`, `kodex`)	"Claude, undo that"
2. Continuation marker on short utterance	Opens with `now` / `also` / `then` / `instead` / `and now` / etc., and is ≤ 12 words	"now also add tests"
3. Coding imperative + technical signal	Verb like `write` / `add` / `refactor` / `fix` + at least one coding noun (`file`, `function`, `bug`, `endpoint`, …) or named tech (`Python`, `React`, …) or a continuation marker	"refactor the auth function"
4. (default: drop)	Side-conversation pattern fires explicitly (phone openers, lunch chatter, greetings to a third party) OR nothing above matched	"hey John, want to grab lunch?" — dropped

Examples:

You: "Claude, build a fizzbuzz in Python."
   → vocative dispatch → Mercury starts
   → direct-dialogue mode active

[phone rings]
You: "hey John, want to grab lunch?"
   → gate: side_conversation → DROPPED  (logged to dashboard)
   → Mercury never sees it. Direct mode stays on.

You: "now also add tests."
   → gate: continuation → SENT to Mercury

The dashboard event log marks dropped utterances with the side_convo.ignored event (greyed italic, (filtered) suffix) plus the rule that rejected it (side_conversation / no_signal), so you can see what got filtered and tweak the keyword sets in halo/followup_gate.py if needed.

The gate is regex-only — zero LLM round-trip cost, ~0 ms latency. It deliberately errs on the side of dropping when ambiguous (a vague "make it bigger" with no clear noun probably drops; just say "Claude, make it bigger" or rephrase with a noun). The safety-vs-friction tradeoff favors safety: a missed legitimate command costs you one retry; a phone-call sentence dispatched to Claude can do real damage.

Disable entirely with FOLLOWUP_GATE_ENABLED = False in halo/config.py if you want the old v1.1.0 behaviour (every direct-mode utterance reaches the agent).

Mythology names

When a new session starts for an agent, Halo assigns it a random Roman-mythology name (Mars, Mercury, Juno, Vesta, Apollo, ...). The name persists across follow-ups and resets on "new task". This makes it clear which agent is talking when both are working: "Mercury says done. Neptune had a problem with the auth tests."

Persistent sessions

Each agent's --continue thread stays alive for the entire Halo process. Walk away, come back, say a wake word, give Claude a follow-up — it picks up where it left off. Explicit reset with "new task" / "fresh session" / "forget that" rotates the name and drops the continuation flag.

Async + concurrent

Claude and Codex can run jobs at the same time. While one is working, the other is free to take a new task, you can fire local tools ("open chrome"), or ask for status ("what's happening"). Job results land between turns so they don't talk over you.

Custom wake word

Halo ships with a custom-trained halo.onnx wake-word model so you wake it with just "halo" (no "hey" prefix). If models/halo.onnx is missing, Halo falls back to openWakeWord's builtin hey_jarvis.

Training your own

If you want a different wake phrase (or to re-train on your own voice samples), use bbarrick/wakeword_trainer. Halo's training scratch dir is training/wakeword_trainer/ (gitignored). The wake DNN trains on top of OpenWakeWord's frozen feature extractor, so the output is a plain ONNX classifier that Halo's existing wake loader picks up automatically.

Cost: free (ElevenLabs free tier gives 10k chars/month — enough for one training run of ~120 voice samples). Training time: ~5 min on an RTX 3060. Best results: 8 phrase variants × 20+ voices for positives, 40+ confusables × 5 voices for hard negatives, 50+ common phrases for soft negatives.

Tune in halo/wake.py:

Constant	Default	What
`WAKE_WORD`	`"halo"`	Phrase name. Halo loads `<MODELS_DIR>/<WAKE_WORD>.onnx`.
`THRESHOLD`	`0.75`	Wake DNN activation score (0-1). Raise to filter false fires, lower if real wakes are missed.
`WAKE_VAD_THRESHOLD`	`0.7`	silero-VAD must also score above this. 0.0 disables the gate.

Voice-mode permissions

Voice users have no way to click "approve" on agent permission prompts, so Halo configures both agents to never ask:

Claude Code: --permission-mode bypassPermissions (documented in claude --help as "Bypass all permission checks"). Without this, Claude would hang on the first Bash call waiting for TUI confirmation.
Codex CLI: -c approval_policy="never" (OpenAI's documented setting for non-interactive runs). --sandbox workspace-write is kept, so Codex's file writes stay constrained to the project root.

This is the right tradeoff for voice — you ARE the supervisor, just via spoken commands instead of keystrokes. If you'd rather have Claude prompt you for risky operations (and you're OK with Halo hanging silently when it does), change --permission-mode bypassPermissions back to acceptEdits in halo/agents.py:AGENTS["claude_code"].

Voice quality

Voice: af_heart — the only A/A-graded voice in the Kokoro lineup. Change in halo/voice.py:DEFAULT_VOICE.
Sanitizer: every spoken string passes through _clean_for_speech(), which strips Markdown (**bold**, `code`, em-dashes, bullets, headings, link syntax), collapses long file paths to basenames, drops mojibake (â€"), and turns newlines into sentence breaks.
Agent voice prompt (VOICE_SYSTEM_PROMPT in halo/agents.py): Claude and Codex are told they're on a voice channel and follow eight rules — no Markdown, 2-sentence max, write code to files, file basenames only, connecting words instead of bullets, one short clarifying question, announce time estimates for slow tasks, and state the filename clearly when they finish something openable.
Open what was built: when an agent says "I wrote it to landing.html", you can immediately say "open landing.html" and Halo opens it in your OS default app — works for any extension (.html, .png, .md, .pdf, ...).

File layout

halo/
  __init__.py     package marker + __version__
  __main__.py     main loop: wake -> conversation -> routing priority
  cli.py          `halo` shell entry point + subcommands
  config.py       paths, sample rate, model name, timing constants
  download_models.py  `halo download-models` — fetches Kokoro
  wake.py         openWakeWord listener + pre-wake audio ring buffer
  record.py       silero-vad RecorderState, chime, backchannel tone
  stt.py          faster-whisper BatchTranscriber (CUDA + DLL fixup)
  router.py       Stage 1 rules + Stage 2 LLM (Ollama qwen2.5 + JSON schema)
  turn.py         per-turn orchestration (record/transcribe; routing in __main__)
  tools.py        cross-platform local tools (browser, calc, notepad, ...)
  followup_gate.py 4-rule keyword filter for direct-dialogue mode (v1.1.1)
  discovery.py    psutil-based scanner — finds running claude/codex (v1.2)
  registry.py     session registry + spoken-target fuzzy matching (v1.2)
  agents.py       agent registry, dispatch, background jobs, session names
  voice.py        Kokoro TTS + Markdown sanitizer
  bus.py          thread-safe event bus (ring buffer of {kind, ts, ...})
  web.py          Flask server: serves dashboard + /api/events polling
  web_static/
    index.html    dashboard (single file, no build step)

pyproject.toml    package metadata (hatchling backend, `halo` console script)
CHANGELOG.md      keep-a-changelog format, per-version history
LICENSE           MIT

models/                          dev-mode location (this dir, project root)
~/.halo/models/                  installed-mode default
  kokoro-v1.0.fp16.onnx          Kokoro 82M voice model
  voices-v1.0.bin                Kokoro voice pack
                                 # override either with HALO_MODELS_DIR

scripts/
  bench_router.py             Stage 1 + Stage 2 latency benchmark
  fw_smoke.py                 faster-whisper accuracy smoke test
  moonshine_smoke.py          legacy Moonshine STT smoke (kept for comparison)
  test_detect_mode.py         adaptive turn-taking unit tests
  test_streaming.py           sentence buffer + Claude stream-json extractor
  test_vocative.py            vocative dispatch unit tests
  test_voice_mode.py          TTS sanitizer + mode-switch tests
  test_fixes_round.py         regression suite for recent bug fixes

Adding a new agent

Drop one entry into AGENTS in halo/agents.py. No other code changes needed:

AGENTS["aider"] = AgentConfig(
    key="aider",
    spoken_name="Aider",
    voice_triggers=("aider",),
    first_call=("aider", "--message", "{PROMPT}", "--yes-always"),
    continue_call=("aider", "--message", "{PROMPT}", "--yes-always"),
    parses_json=False,
)

Tokens {PROMPT} and {CWD} get substituted at call time. Then teach the Stage 2 router prompt about the new agent if you want voice routing ("aider, fix this") or just rely on the vocative dispatch in __main__.py:_vocative_dispatch (you'll need to extend its regex).

Wake word note

openWakeWord ships alexa, hey_jarvis, hey_mycroft, hey_rhasspy out of the box. There is no built-in hey_halo. Step 1 uses hey_jarvis as a placeholder.

To train a real hey_halo model:

Record ~100 samples of yourself saying "Halo" using openWakeWord's training notebook
Drop the resulting .onnx into your MODELS_DIR (defaults to ./models/ in dev, ~/.halo/models/ when pip-installed; override with HALO_MODELS_DIR)
Update halo/wake.py:WAKE_WORD

Threshold lives in halo/wake.py:THRESHOLD (default 0.5). Raise if you get false positives, lower if it takes too many tries.

Latency budget (RTX 3060)

Path	Cold	Warm
Wake word detect	instant	instant
Speech → silero silence (600 ms base)	~0.6 s	~0.6 s
faster-whisper STT (1-5 s utterance)	~5 s	~0.5-1 s
Stage 2 LLM (qwen2.5:1.5b JSON output)	~5-7 s	~3-4 s
Tool fast-path (open app)	<50 ms	<50 ms
Vocative agent spawn	<100 ms	<100 ms
Agent task (Claude/Codex code work)	varies	varies

Typical wake-to-action timings:

"Hey Jarvis. Open chrome." → ~1.5 s
"Hey Jarvis. Claude, write hello.py." → ~2 s (then agent works)
"build me a website" (unnamed, fallback)→ ~5 s (Stage 2 LLM fires)

Roadmap

✅ Wake word listener
✅ Record + faster-whisper STT (replaced whisper.cpp and Moonshine)
✅ Two-stage router (rules + Ollama)
✅ Local tool dispatch (browser, calc, notepad, explorer, terminal)
✅ Kokoro TTS + Markdown sanitizer
✅ Claude Code subprocess + persistent session
✅ Codex CLI subprocess + persistent session
✅ Async background jobs, status queries, replay
✅ Conversation mode, end phrases, idle sleep
✅ Direct-dialogue mode, mode switches, mythology names
✅ Vocative dispatch — bypass router for explicit agent calls
✅ Streaming Claude output → live TTS sentence-by-sentence
✅ Local web dashboard with live pipeline / transcript / jobs / log
✅ Packaged for pip install (v1.0)
Publish to PyPI (pip install halo-voice direct, no git+)
Custom hey_halo wake model (needs voice samples)
Streaming Codex (Codex CLI doesn't expose stream-json yet; tracking)
Premium TTS provider abstraction (ElevenLabs etc., opt-in)
Agent registry from external TOML — halo init to scaffold new agents
Project registry — halo project add/use <name> for multi-project flows

Troubleshooting

Symptom	Fix
`cublas64_12.dll not found`	`pip install nvidia-cublas-cu12 nvidia-cudnn-cu12` — `halo/stt.py` adds them to PATH on import
Wake takes many tries	Lower `THRESHOLD` in `halo/wake.py` (try 0.4) or train a custom wake model
Voice sounds robotic	Try `af_bella` or `bf_emma` in `halo/voice.py:DEFAULT_VOICE`
`Claude Code CLI not found`	`npm install -g @anthropic-ai/claude-code` and re-open terminal
Codex auth prompts mid-run	Run `codex login` once in a normal terminal first
Halo speaks Claude's Markdown literally	Should be sanitized — check `_clean_for_speech` in `halo/voice.py`
`Input must be provided` from Claude	Wake-strip left an empty prompt; current code guards this — file an issue if it recurs
Ctrl-C doesn't stop Halo	Fixed — wake stream now polls with 250 ms timeout so SIGINT propagates

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
halo		halo
models		models
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Halo

Stack

v1.2 highlights

Install

1. Halo itself

2. Kokoro TTS model files (~200 MB)

3. Ollama + routing model

4. Coding agents (Claude Code and/or Codex CLI)

Run

CLI

Dev install

How conversations work

Vocative & verbal dispatch — the killer features

Multi-session mode

What you can say

How collisions are resolved

Per-project persistent sessions

Verifying discovery

Single-session fallback

Brain-side routing (not regex)

Follow-up gate

Mythology names

Persistent sessions

Async + concurrent

Custom wake word

Training your own

Voice-mode permissions

Voice quality

File layout

Adding a new agent

Wake word note

Latency budget (RTX 3060)

Roadmap

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages