Hey Vox - local-first voice agent daemon for macOS
voxagent is a local-first voice agent runtime for macOS that gives you a fast, interruptible "talk to your computer" workflow. It combines streaming audio capture, speech-to-text, LLM orchestration, MCP tool execution, and text-to-speech into one coherent daemon architecture. It is designed for low-latency voice loops, strong privacy defaults, and extensibility across providers.
brew install tiagogranelli/voxagent/voxagentBuild from source
Requires macOS 15+ and Xcode 16+ (Swift 6.2).
git clone https://github.com/TiagoGranelli/voxagent.git
cd voxagent
swift build -c release
sudo cp .build/release/voxagent /usr/local/bin/Sign in with your existing LLM subscription — no API keys needed:
voxagent auth login anthropic # Claude Pro/Max
voxagent auth login openai # ChatGPT Plus (browser or device flow)
voxagent auth login google # Gemini Advanced
voxagent auth login github # GitHub Copilot (device flow)Use --device for headless environments (OpenAI and GitHub support this):
voxagent auth login openai --device
voxagent auth login github --deviceCheck status across all providers:
voxagent auth statusAlternatively, configure an API key directly:
voxagent config set api.anthropic.key "$ANTHROPIC_API_KEY"voxagent start --foregroundThen speak your wake phrase (default: "Hey Vox") or send text:
voxagent ask "Summarize my open tasks"
voxagent say "Audio pipeline online"Detailed architecture, module contracts, latency budgets, and roadmap are in docs/ARCHITECTURE.md.
- Wake phrase detection from streaming partial transcripts
- Streaming STT pipeline (WhisperKit first-class)
- Multi-provider LLM orchestration with tool calling
- MCP tool integration (stdio/HTTP servers)
- Streaming TTS playback (including ElevenLabs)
- Persona system with runtime switching and tool policy
- Barge-in interruption during speech playback
voxagent is built on these projects from steipete and related ecosystem tooling:
voxagent start [--foreground] [--local-only]
voxagent stop
voxagent status [--json]
voxagent ask "<text>"
voxagent say "<text>"
voxagent auth login <provider> [--device]
voxagent auth logout <provider|all>
voxagent auth status
voxagent persona list
voxagent persona set <id>
voxagent mcp list
voxagent mcp reload
voxagent logs tail
- Root config:
~/.config/voxagent/config.yaml - MCP servers:
~/.config/voxagent/mcp-servers.json - Personas:
~/.config/voxagent/personas/ - Runtime socket and pid:
~/.local/share/voxagent/run/ - Conversation logs:
~/.local/share/voxagent/conversations/
voxagent is local-first by default and supports fully offline operation when paired with local STT/LLM/TTS providers. Network egress is policy-controlled and auditable, and --local-only hard-blocks remote providers and tools.
See docs/ARCHITECTURE.md for versioned milestones from v0.1 through v1.0.
Contributions are welcome. Please open an issue for design discussion before large changes, and keep pull requests focused on a single subsystem when possible.
MIT. See LICENSE.