Voice front-end for agentic coding tools. Say a wake word, talk to Claude Code or Codex CLI in plain English, and hear the result back. Halo is the audio layer; the agents do the work.
Status: v1.2.1 — hybrid streaming + one-sentence reply summaries. Long agent replies no longer monologue at you — Halo speaks the first 2 sentences live so you know the agent is working, then summarizes the rest into one sentence via the local brain. Full text still prints to the terminal. Builds on v1.2 (multi-session mode), v1.1.1 (follow-up gate), v1.1 (custom "halo" wake word, persistent Claude sessions, separate-console live feed, noise-suppression pipeline). Live web dashboard at http://127.0.0.1:7070.
See CHANGELOG.md for the full history. Licensed MIT (LICENSE).
The dashboard shows the entire pipeline live: which stage is firing
(wake / record / transcribe / route / agent / voice), the most recent
transcript, every agent session by Roman name with state and elapsed
time, and a color-coded event log. Auto-launches in your default
browser when you run Halo (set HALO_NO_BROWSER=1 to suppress).
HALO • LOCAL • listening live · v1.1
─────────────────────────────────────────────────────────────────────
[wake] → [record] → [transcribe] → [route] → [agent] → [voice]
─────────────────────────────────────────────────────────────────────
TRANSCRIPT │ AGENTS 1 running
│ Mercury · claude RUNNING 23s
"Claude, build me a one-line │ "build me a hello..."
python script that prints hello" │
│ Neptune · codex IDLE
● Mercury — I wrote it to hello.py. │
─────────────────────────────────────────────────────────────────────
EVENT LOG ● LIVE
14:23:01 wake.fired score 0.83
14:23:03 record.opened
14:23:04 stt.done 620ms "Claude, build me a..."
14:23:04 route.matched vocative -> claude_code
14:23:04 agent.dispatched Mercury starting
14:23:12 agent.streaming Mercury: I wrote it to hello.py.
14:23:13 tts.spoke Halo: I wrote it to hello.py.
14:23:13 agent.done Mercury (8s)
You: "halo. Claude, write a one-line python script that prints hello."
Halo: "On it. I'm calling this session Mercury."
... a separate PowerShell window pops up live-tailing Claude
(text deltas, tool calls, result events) so you can watch
Claude work without scrolling Halo's own terminal
... Mercury speaks: "I wrote it to hello dot py."
You: "now make it print goodbye instead."
... same Mercury session (persistent process, no spawn cost)
... gate: continuation marker matched, sent
Halo: "Done."
[phone rings]
You: "hey John, want to grab lunch?"
... gate: side_conversation pattern matched, dropped silently
... dashboard logs it; Mercury never sees it
You: "ask Codex to refactor it into a function."
... verbal-dispatch regex catches this, skips Stage 2 LLM
Halo: "On it. I'm calling this session Neptune."
You: "back to halo. open hello.py."
Halo: "Opened hello.py."
You: "end session."
Halo: "Goodbye."
100% local out of the box (Ollama + faster-whisper + Kokoro). No API keys required to run Halo — the agents themselves (Claude Code / Codex) authenticate against their own services.
| Layer | Tech | Notes |
|---|---|---|
| Wake word | openWakeWord + custom halo.onnx (trained via bbarrick/wakeword_trainer) |
Single-word "halo" wake (188 positives × 22 voices + 524 negatives via ElevenLabs TTS, F1 = 0.80). silero-VAD gate on top via vad_threshold=0.7. Falls back to builtin hey_jarvis if halo.onnx is missing. |
| Mic | sounddevice | 16 kHz mono int16 |
| VAD | silero-vad v5 | 600 ms base silence, mode-adaptive extensions |
| STT | faster-whisper + distil-large-v3 | int8_float16 on CUDA, ~500 ms per utterance |
| Router | Ollama + qwen2.5:1.5b-instruct |
Stage 2 LLM, fires only when no local handler matches |
| TTS | Kokoro-82M ONNX (fp16) | af_heart voice, sanitized for Markdown |
| Agents | Claude Code + Codex CLI subprocesses | Persistent sessions; voice-mode flags (--permission-mode bypassPermissions / approval_policy="never") so no manual approvals |
Hardware target: Windows + NVIDIA GPU (RTX 3060 is enough). Mac / Linux paths exist for tools and TTS; CUDA-specific bits gracefully fall back to CPU.
- Hybrid streaming + one-sentence reply summaries (v1.2.1) —
Streaming agents (Claude) speak the first 2 sentences live as
they're generated, so you immediately know the agent is alive and
working. After that, Halo stops speaking and silently buffers the
rest. When the agent finishes, if the reply ran past the live budget
Halo sends the remainder through the brain (qwen2.5:1.5b) for a
one-sentence cap: "And, I added bcrypt password verification to auth
dot py and wired it into login user." For batch agents (Codex),
same idea — speak short replies as-is, summarize when long
(> 400 chars). Tunable via
LIVE_STREAM_MAX_SENTENCESandREPLY_SUMMARIZE_THRESHOLD_CHARSinhalo/config.py. The full reply still prints to your terminal so nothing's lost — TTS is just the summary. - Multi-session mode (v1.2.0) — Halo runs a process discovery
scanner that finds every
claude/codexsession on your machine and feeds the list to the routing brain. You no longer talk to one Claude bound to Halo's launch dir — you talk to whichever session you name. Voice commands:- "switch to website" / "work on the AIP one" → set active session
- "what sessions do I have?" → spoken list
- "where am I?" / "what project am I on?" → speak the active one
- "in website, ask Claude to add dark mode" → one-shot dispatch to another session without changing the active one
- "tell all of them to run their tests" → fanout to every session Each project gets its own persistent Claude subprocess (LRU-bounded), so the same agent can have parallel jobs across N projects at once. Brain decisions live in the Stage 2 LLM prompt — no new regex. See Multi-session mode below.
- Follow-up gate (v1.1.1) — once you're in direct dialogue with an
agent, the mic stays hot for follow-ups but a 4-rule keyword filter
decides whether each utterance is actually for the agent. Phone calls,
side conversations to colleagues, and your own muttering get silently
dropped instead of being dispatched to Claude. See
Follow-up gate below for the rules. Toggle with
FOLLOWUP_GATE_ENABLEDinhalo/config.py. - Custom "halo" wake word — single-word wake ("halo" alone fires it, no "hey" prefix needed). Trained on RTX 3060 in ~30 min via bbarrick's wakeword_trainer + ElevenLabs free-tier TTS.
- Persistent Claude sessions — one long-lived
claude -p --input-format stream-jsonprocess per session, kept alive across every voice turn. Kills the per-turn-subprocess race that causedexit Nonefailures and "starting session" duplicates. - Separate-console live feed — when you dispatch to Claude, a
PowerShell window pops up tailing the session log. You watch Claude's
text deltas, tool calls, and result events in real time, while
Halo's own terminal stays clean. Toggle with
AGENT_CLI_VISIBLE = Falseinhalo/config.py. - Noise suppression pipeline — three layers: mic noise gate
(
MIC_NOISE_GATE_RMS = 0.005), wake silero-VAD gate (vad_threshold = 0.7), Whisper internal VAD filter (vad_filter=True). Plus a residual STT hallucination filter for Whisper's "Thank you" / "you" / "hello" artifacts on silence. - Fuzzy-fallback dispatcher — when Ollama's down AND you said an agent name (even Whisper-mangled as "Cloud" / "Codec"), dispatches to that agent directly instead of giving up.
halo doctor— diagnose Ollama, agent CLIs, Kokoro models.
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install git+https://github.com/VW3st/halo.git(PyPI release coming — pip install halo-voice will work once published.)
On a Windows machine with an NVIDIA GPU, also grab the CUDA wheels for faster-whisper:
pip install "halo-voice[gpu-windows] @ git+https://github.com/VW3st/halo.git"silero-vad pulls PyTorch (~800 MB). To save bandwidth on a CPU-only
laptop, install Torch CPU-only first:
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install git+https://github.com/VW3st/halo.githalo download-modelsDrops kokoro-v1.0.fp16.onnx + voices-v1.0.bin into ~/.halo/models/
(or ./models/ if you're running from a git checkout). Without these,
Halo runs in silent (text-only) mode.
The other models download themselves on first use:
- openWakeWord pretrained ONNX (~few MB)
- silero-vad model (~2 MB)
- faster-whisper distil-large-v3 (~1.5 GB, from HuggingFace)
Install Ollama from https://ollama.com/download, then:
ollama pull qwen2.5:1.5b-instructOllama auto-starts a background service on localhost:11434. Halo
talks to it over HTTP.
Why qwen2.5:1.5b-instruct? Non-reasoning, ~1.5 GB, hits sub-100 ms
on a warm KV cache. Reasoning models (qwen3, deepseek-r1) emit <think>
tokens and blow the per-turn budget to 20 s+. Optional alternatives via
OLLAMA_MODEL in halo/config.py: qwen2.5:3b-instruct (better
accuracy, ~3× slower), qwen2.5:7b-instruct (best accuracy, needs more
VRAM).
Install whichever you want Halo to dispatch to. At least one is needed unless you only plan to use local tools.
# Claude Code (requires Anthropic auth — `claude login` once)
npm install -g @anthropic-ai/claude-code
# Codex CLI (requires OpenAI auth — `codex login` once)
npm install -g @openai/codexBoth must be on PATH. Verify with claude --version and codex --version.
halo(or python -m halo if you cloned the repo directly.)
You'll see preload messages, then Halo ready. Say the wake word...
and hear "Halo online." Speak the wake word ("Hey Jarvis"), then
your command.
halo always uses the current working directory as Claude/Codex's
project root, so cd into your project before launching. Override
the model location with HALO_MODELS_DIR=/path/to/models.
halo start the voice loop (default)
halo run same as above
halo download-models fetch Kokoro TTS model files
halo doctor check Ollama, agents (claude/codex), models — read-only
halo sessions list running coding-agent sessions on this machine (v1.2)
halo version print installed version
halo --help show help
halo doctor is the first thing to run after install. It probes every
external dependency (Ollama running + routing model pulled, Claude CLI
on PATH and responsive, Codex CLI on PATH and responsive, Kokoro model
files present) and prints the exact install command for anything that's
missing. It never installs or modifies anything itself.
Clone the repo and install editable:
git clone https://github.com/VW3st/halo.git
cd halo
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[gpu-windows,dev]
python -m haloOnce a wake word fires, you enter a conversation — you don't have to re-say "Hey Jarvis" between turns. Each thing you say is matched against this priority list, first match wins:
1. End phrase "over and out" / "goodbye" / "go to sleep" -> exit conversation
2. New session "new task" / "start over" / "forget that" -> reset agent sessions
3. Vocative dispatch "Claude, build X" / "Codex, run Y" -> direct to that agent (no LLM)
3b. Verbal dispatch "ask Codex to ..." / "tell Claude to ..." -> direct to that agent (no LLM)
4. Pure mode switch "switch to codex" / "talk to claude" -> set direct-dialogue agent
5. Back to Halo "back to halo" / "talk to me" -> exit direct-dialogue mode
6. Status query "what's happening" / "are you done" -> read job registry
7. Replay last result "what did Claude say" / "repeat what Codex..." -> re-speak last response
8. Local tool "open chrome" / "launch calculator" -> system app handlers
9. Direct-dialogue active sends utterance straight to current agent -> --continue
10. Stage 2 LLM (fallback) only fires when nothing above matched -> qwen2.5 routing
After 5 s of true idleness (no jobs running, no speech), Halo goes back to wake-listening.
Naming the agent up front skips the router LLM entirely. Two forms work:
Vocative (comma required — faster-whisper adds it reliably):
"Claude, build a login page with Supabase"
"Codex, run the tests and fix anything red"
"Claude code, summarize the README"
"Hey Codex, list the open issues"
Verbal (verb + agent + instruction):
"ask Codex to build a landing page"
"tell Claude to refactor auth.py"
"have Codex deploy the latest build"
"get Claude to write tests for this"
"use Codex for the bug in routes.py"
"let Claude handle the migration"
Both bypass the Stage 2 LLM, which means no agent-name swap or hallucinated extras (a real failure mode on the 1.5B router model).
After the first dispatch, you're automatically in direct-dialogue
mode with that agent — every follow-up goes straight to its
--continue, no LLM round-trip:
You: "Claude, build a hello world script."
Halo: "On it. I'm calling this session Mercury."
... Mercury responds
You: "now add a docstring." (direct -> Mercury)
You: "also make it take a name." (direct -> Mercury)
To switch agents mid-flow:
"switch to codex" / "talk to codex" -> direct dialogue with Codex
"transfer me to codex" / "transfer to ..." -> same as switch
"Codex, ..." -> dispatches to Codex AND switches
"back to halo" / "talk to me" -> exit direct dialogue
"transfer me back to halo" -> same
Once you're in direct dialogue with an agent, Halo never auto-sleeps —
the conversation stays open until you explicitly say "goodbye",
"go to sleep", "over and out", or any of the end phrases. The 5-second
idle sleep only applies when you're in plain Halo mode (no direct
dialogue and no running jobs).
By default, Halo scans your machine every 2 seconds for running
coding-agent CLIs (claude and codex, including their npm shims),
records each one's cwd and parent terminal, and exposes the list to the
Stage 2 LLM brain as discovered sessions. The brain then routes
voice commands accordingly — no manual halo project add step, no
focus-binding to set up, no keystroke-injection trickery.
| Spoken | What happens |
|---|---|
| "halo, what sessions do I have?" | Brain emits session_action: list_sessions → Halo speaks: "three sessions: halo, website, AIP. You're on halo." |
| "halo, switch to website" / "work on the AIP one" | Brain emits session_action: switch → active session changes; next dispatch goes there |
| "halo, where am I?" | Brain emits session_action: where_am_i → speaks active session label + cwd |
| "halo, Claude, refactor the auth module" | Vocative dispatch (existing) into the active session's cwd |
| "halo, in website, ask Claude to add dark mode" | One-shot dispatch into the website session without changing active |
| "halo, tell all of them to run their tests" | Fanout — same prompt to every discovered session |
Two cwds with the same basename (D:\client-a\halo and D:\client-b\halo)
get parent-disambiguated labels: client-a/halo and client-b/halo.
Beyond that, pid suffix is appended (client-a/halo#1234). You can
always say "what sessions do I have" to hear the current labels.
When you dispatch to a discovered session, Halo lazily spawns its own
persistent Claude subprocess in that cwd. This is keyed by
(agent_key, cwd), so:
- the same agent can have parallel jobs across N projects
- each project's
--continuethread stays alive for the Halo process lifetime reset_sessionaccepts an optional cwd to scope the reset
halo sessionsRead-only one-shot scan. Prints a table of label / agent / pid / cwd
for every running coding-agent CLI. Exit 0 when something is found, 1
when nothing is. Use this before booting the full voice loop to confirm
discovery works on your machine.
When psutil isn't installed OR no sessions are discovered, Halo runs
in v1.1 single-session mode — spawning its own Claude in the launch
cwd, exactly as before. The brain prompt only carries CURRENT CONTEXT
when there's something to report, so single-session callers pay zero
token overhead.
The Stage 2 LLM (qwen2.5:1.5b by default) receives a CURRENT CONTEXT
block prepended to every transcript when sessions are discovered:
# CURRENT CONTEXT (you decide target_session from this)
active_session: halo
discovered_sessions:
- label: halo agent: claude_code cwd: D:/Halo
- label: website agent: claude_code cwd: D:/website-redesign
- label: aip agent: claude_code cwd: D:/AIP-Claude
The JSON schema gained two fields the brain populates:
target_session— label /"active"/"focused"/"all"/""session_action—""/"switch"/"list_sessions"/"where_am_i"
This means new routing patterns ("work on the third one", "the AI thing I was looking at", "the website one with the dark theme") are all handled by upgrading the prompt — not by adding regex.
Once you've dispatched to an agent (vocative, verbal, or pure switch), Halo enters direct-dialogue mode — the mic stays hot so follow-ups ("now also add tests") don't need to re-state the agent name.
Without a filter, that hot mic would dispatch everything it captures to the agent: the phone call you take 10 seconds after the dispatch, the colleague who walks over, even you muttering at your own screen. All of that audio would get transcribed and sent to Claude as if it were a command.
The follow-up gate (halo/followup_gate.py) runs every direct-mode
transcript through 4 rules before dispatch. First match wins → SEND.
If none match → drop silently and log to the dashboard.
| Rule | Fires when… | Example that fires it |
|---|---|---|
| 1. Agent name | The transcript mentions claude, codex, or any per-agent fuzzy_triggers (e.g. cloud, clawed, kodex) |
"Claude, undo that" |
| 2. Continuation marker on short utterance | Opens with now / also / then / instead / and now / etc., and is ≤ 12 words |
"now also add tests" |
| 3. Coding imperative + technical signal | Verb like write / add / refactor / fix + at least one coding noun (file, function, bug, endpoint, …) or named tech (Python, React, …) or a continuation marker |
"refactor the auth function" |
| 4. (default: drop) | Side-conversation pattern fires explicitly (phone openers, lunch chatter, greetings to a third party) OR nothing above matched | "hey John, want to grab lunch?" — dropped |
Examples:
You: "Claude, build a fizzbuzz in Python."
→ vocative dispatch → Mercury starts
→ direct-dialogue mode active
[phone rings]
You: "hey John, want to grab lunch?"
→ gate: side_conversation → DROPPED (logged to dashboard)
→ Mercury never sees it. Direct mode stays on.
You: "now also add tests."
→ gate: continuation → SENT to Mercury
The dashboard event log marks dropped utterances with the
side_convo.ignored event (greyed italic, (filtered) suffix) plus
the rule that rejected it (side_conversation / no_signal), so you
can see what got filtered and tweak the keyword sets in
halo/followup_gate.py if needed.
The gate is regex-only — zero LLM round-trip cost, ~0 ms latency. It deliberately errs on the side of dropping when ambiguous (a vague "make it bigger" with no clear noun probably drops; just say "Claude, make it bigger" or rephrase with a noun). The safety-vs-friction tradeoff favors safety: a missed legitimate command costs you one retry; a phone-call sentence dispatched to Claude can do real damage.
Disable entirely with FOLLOWUP_GATE_ENABLED = False in
halo/config.py if you want the old v1.1.0 behaviour (every direct-mode
utterance reaches the agent).
When a new session starts for an agent, Halo assigns it a random Roman-mythology name (Mars, Mercury, Juno, Vesta, Apollo, ...). The name persists across follow-ups and resets on "new task". This makes it clear which agent is talking when both are working: "Mercury says done. Neptune had a problem with the auth tests."
Each agent's --continue thread stays alive for the entire Halo
process. Walk away, come back, say a wake word, give Claude a
follow-up — it picks up where it left off. Explicit reset with
"new task" / "fresh session" / "forget that" rotates the name and
drops the continuation flag.
Claude and Codex can run jobs at the same time. While one is
working, the other is free to take a new task, you can fire local
tools ("open chrome"), or ask for status ("what's happening").
Job results land between turns so they don't talk over you.
Halo ships with a custom-trained halo.onnx wake-word model so you
wake it with just "halo" (no "hey" prefix). If models/halo.onnx
is missing, Halo falls back to openWakeWord's builtin hey_jarvis.
If you want a different wake phrase (or to re-train on your own voice
samples), use bbarrick/wakeword_trainer.
Halo's training scratch dir is training/wakeword_trainer/
(gitignored). The wake DNN trains on top of OpenWakeWord's frozen
feature extractor, so the output is a plain ONNX classifier that
Halo's existing wake loader picks up automatically.
Cost: free (ElevenLabs free tier gives 10k chars/month — enough for one training run of ~120 voice samples). Training time: ~5 min on an RTX 3060. Best results: 8 phrase variants × 20+ voices for positives, 40+ confusables × 5 voices for hard negatives, 50+ common phrases for soft negatives.
Tune in halo/wake.py:
| Constant | Default | What |
|---|---|---|
WAKE_WORD |
"halo" |
Phrase name. Halo loads <MODELS_DIR>/<WAKE_WORD>.onnx. |
THRESHOLD |
0.75 |
Wake DNN activation score (0-1). Raise to filter false fires, lower if real wakes are missed. |
WAKE_VAD_THRESHOLD |
0.7 |
silero-VAD must also score above this. 0.0 disables the gate. |
Voice users have no way to click "approve" on agent permission prompts, so Halo configures both agents to never ask:
- Claude Code:
--permission-mode bypassPermissions(documented inclaude --helpas "Bypass all permission checks"). Without this, Claude would hang on the first Bash call waiting for TUI confirmation. - Codex CLI:
-c approval_policy="never"(OpenAI's documented setting for non-interactive runs).--sandbox workspace-writeis kept, so Codex's file writes stay constrained to the project root.
This is the right tradeoff for voice — you ARE the supervisor, just
via spoken commands instead of keystrokes. If you'd rather have Claude
prompt you for risky operations (and you're OK with Halo hanging
silently when it does), change --permission-mode bypassPermissions
back to acceptEdits in halo/agents.py:AGENTS["claude_code"].
- Voice:
af_heart— the only A/A-graded voice in the Kokoro lineup. Change inhalo/voice.py:DEFAULT_VOICE. - Sanitizer: every spoken string passes through
_clean_for_speech(), which strips Markdown (**bold**,`code`, em-dashes, bullets, headings, link syntax), collapses long file paths to basenames, drops mojibake (â€"), and turns newlines into sentence breaks. - Agent voice prompt (
VOICE_SYSTEM_PROMPTinhalo/agents.py): Claude and Codex are told they're on a voice channel and follow eight rules — no Markdown, 2-sentence max, write code to files, file basenames only, connecting words instead of bullets, one short clarifying question, announce time estimates for slow tasks, and state the filename clearly when they finish something openable. - Open what was built: when an agent says "I wrote it to landing.html",
you can immediately say "open landing.html" and Halo opens it in your
OS default app — works for any extension (
.html,.png,.md,.pdf, ...).
halo/
__init__.py package marker + __version__
__main__.py main loop: wake -> conversation -> routing priority
cli.py `halo` shell entry point + subcommands
config.py paths, sample rate, model name, timing constants
download_models.py `halo download-models` — fetches Kokoro
wake.py openWakeWord listener + pre-wake audio ring buffer
record.py silero-vad RecorderState, chime, backchannel tone
stt.py faster-whisper BatchTranscriber (CUDA + DLL fixup)
router.py Stage 1 rules + Stage 2 LLM (Ollama qwen2.5 + JSON schema)
turn.py per-turn orchestration (record/transcribe; routing in __main__)
tools.py cross-platform local tools (browser, calc, notepad, ...)
followup_gate.py 4-rule keyword filter for direct-dialogue mode (v1.1.1)
discovery.py psutil-based scanner — finds running claude/codex (v1.2)
registry.py session registry + spoken-target fuzzy matching (v1.2)
agents.py agent registry, dispatch, background jobs, session names
voice.py Kokoro TTS + Markdown sanitizer
bus.py thread-safe event bus (ring buffer of {kind, ts, ...})
web.py Flask server: serves dashboard + /api/events polling
web_static/
index.html dashboard (single file, no build step)
pyproject.toml package metadata (hatchling backend, `halo` console script)
CHANGELOG.md keep-a-changelog format, per-version history
LICENSE MIT
models/ dev-mode location (this dir, project root)
~/.halo/models/ installed-mode default
kokoro-v1.0.fp16.onnx Kokoro 82M voice model
voices-v1.0.bin Kokoro voice pack
# override either with HALO_MODELS_DIR
scripts/
bench_router.py Stage 1 + Stage 2 latency benchmark
fw_smoke.py faster-whisper accuracy smoke test
moonshine_smoke.py legacy Moonshine STT smoke (kept for comparison)
test_detect_mode.py adaptive turn-taking unit tests
test_streaming.py sentence buffer + Claude stream-json extractor
test_vocative.py vocative dispatch unit tests
test_voice_mode.py TTS sanitizer + mode-switch tests
test_fixes_round.py regression suite for recent bug fixes
Drop one entry into AGENTS in halo/agents.py. No other code
changes needed:
AGENTS["aider"] = AgentConfig(
key="aider",
spoken_name="Aider",
voice_triggers=("aider",),
first_call=("aider", "--message", "{PROMPT}", "--yes-always"),
continue_call=("aider", "--message", "{PROMPT}", "--yes-always"),
parses_json=False,
)Tokens {PROMPT} and {CWD} get substituted at call time. Then teach
the Stage 2 router prompt about the new agent if you want voice routing
("aider, fix this") or just rely on the vocative dispatch in
__main__.py:_vocative_dispatch (you'll need to extend its regex).
openWakeWord ships alexa, hey_jarvis, hey_mycroft, hey_rhasspy
out of the box. There is no built-in hey_halo. Step 1 uses
hey_jarvis as a placeholder.
To train a real hey_halo model:
- Record ~100 samples of yourself saying "Halo" using openWakeWord's training notebook
- Drop the resulting
.onnxinto yourMODELS_DIR(defaults to./models/in dev,~/.halo/models/when pip-installed; override withHALO_MODELS_DIR) - Update
halo/wake.py:WAKE_WORD
Threshold lives in halo/wake.py:THRESHOLD (default 0.5). Raise if
you get false positives, lower if it takes too many tries.
| Path | Cold | Warm |
|---|---|---|
| Wake word detect | instant | instant |
| Speech → silero silence (600 ms base) | ~0.6 s | ~0.6 s |
| faster-whisper STT (1-5 s utterance) | ~5 s | ~0.5-1 s |
| Stage 2 LLM (qwen2.5:1.5b JSON output) | ~5-7 s | ~3-4 s |
| Tool fast-path (open app) | <50 ms | <50 ms |
| Vocative agent spawn | <100 ms | <100 ms |
| Agent task (Claude/Codex code work) | varies | varies |
Typical wake-to-action timings:
"Hey Jarvis. Open chrome."→ ~1.5 s"Hey Jarvis. Claude, write hello.py."→ ~2 s (then agent works)"build me a website"(unnamed, fallback)→ ~5 s (Stage 2 LLM fires)
- ✅ Wake word listener
- ✅ Record + faster-whisper STT (replaced whisper.cpp and Moonshine)
- ✅ Two-stage router (rules + Ollama)
- ✅ Local tool dispatch (browser, calc, notepad, explorer, terminal)
- ✅ Kokoro TTS + Markdown sanitizer
- ✅ Claude Code subprocess + persistent session
- ✅ Codex CLI subprocess + persistent session
- ✅ Async background jobs, status queries, replay
- ✅ Conversation mode, end phrases, idle sleep
- ✅ Direct-dialogue mode, mode switches, mythology names
- ✅ Vocative dispatch — bypass router for explicit agent calls
- ✅ Streaming Claude output → live TTS sentence-by-sentence
- ✅ Local web dashboard with live pipeline / transcript / jobs / log
- ✅ Packaged for
pip install(v1.0) - Publish to PyPI (
pip install halo-voicedirect, nogit+) - Custom
hey_halowake model (needs voice samples) - Streaming Codex (Codex CLI doesn't expose stream-json yet; tracking)
- Premium TTS provider abstraction (ElevenLabs etc., opt-in)
- Agent registry from external TOML —
halo initto scaffold new agents - Project registry —
halo project add/use <name>for multi-project flows
| Symptom | Fix |
|---|---|
cublas64_12.dll not found |
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 — halo/stt.py adds them to PATH on import |
| Wake takes many tries | Lower THRESHOLD in halo/wake.py (try 0.4) or train a custom wake model |
| Voice sounds robotic | Try af_bella or bf_emma in halo/voice.py:DEFAULT_VOICE |
Claude Code CLI not found |
npm install -g @anthropic-ai/claude-code and re-open terminal |
| Codex auth prompts mid-run | Run codex login once in a normal terminal first |
| Halo speaks Claude's Markdown literally | Should be sanitized — check _clean_for_speech in halo/voice.py |
Input must be provided from Claude |
Wake-strip left an empty prompt; current code guards this — file an issue if it recurs |
| Ctrl-C doesn't stop Halo | Fixed — wake stream now polls with 250 ms timeout so SIGINT propagates |