Skip to content

TiagoGranelli/voxagent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

voxagent

Hey Vox - local-first voice agent daemon for macOS

Swift Platform License

What is voxagent

voxagent is a local-first voice agent runtime for macOS that gives you a fast, interruptible "talk to your computer" workflow. It combines streaming audio capture, speech-to-text, LLM orchestration, MCP tool execution, and text-to-speech into one coherent daemon architecture. It is designed for low-latency voice loops, strong privacy defaults, and extensibility across providers.

Quick start

1) Install

brew install tiagogranelli/voxagent/voxagent
Build from source

Requires macOS 15+ and Xcode 16+ (Swift 6.2).

git clone https://github.com/TiagoGranelli/voxagent.git
cd voxagent
swift build -c release
sudo cp .build/release/voxagent /usr/local/bin/

2) Authenticate

Sign in with your existing LLM subscription — no API keys needed:

voxagent auth login anthropic   # Claude Pro/Max
voxagent auth login openai      # ChatGPT Plus (browser or device flow)
voxagent auth login google      # Gemini Advanced
voxagent auth login github      # GitHub Copilot (device flow)

Use --device for headless environments (OpenAI and GitHub support this):

voxagent auth login openai --device
voxagent auth login github --device

Check status across all providers:

voxagent auth status

Alternatively, configure an API key directly:

voxagent config set api.anthropic.key "$ANTHROPIC_API_KEY"

3) Run

voxagent start --foreground

Then speak your wake phrase (default: "Hey Vox") or send text:

voxagent ask "Summarize my open tasks"
voxagent say "Audio pipeline online"

Architecture overview

Detailed architecture, module contracts, latency budgets, and roadmap are in docs/ARCHITECTURE.md.

Features

  • Wake phrase detection from streaming partial transcripts
  • Streaming STT pipeline (WhisperKit first-class)
  • Multi-provider LLM orchestration with tool calling
  • MCP tool integration (stdio/HTTP servers)
  • Streaming TTS playback (including ElevenLabs)
  • Persona system with runtime switching and tool policy
  • Barge-in interruption during speech playback

Built with

voxagent is built on these projects from steipete and related ecosystem tooling:

CLI reference

voxagent start [--foreground] [--local-only]
voxagent stop
voxagent status [--json]
voxagent ask "<text>"
voxagent say "<text>"
voxagent auth login <provider> [--device]
voxagent auth logout <provider|all>
voxagent auth status
voxagent persona list
voxagent persona set <id>
voxagent mcp list
voxagent mcp reload
voxagent logs tail

Configuration

  • Root config: ~/.config/voxagent/config.yaml
  • MCP servers: ~/.config/voxagent/mcp-servers.json
  • Personas: ~/.config/voxagent/personas/
  • Runtime socket and pid: ~/.local/share/voxagent/run/
  • Conversation logs: ~/.local/share/voxagent/conversations/

Privacy

voxagent is local-first by default and supports fully offline operation when paired with local STT/LLM/TTS providers. Network egress is policy-controlled and auditable, and --local-only hard-blocks remote providers and tools.

Roadmap

See docs/ARCHITECTURE.md for versioned milestones from v0.1 through v1.0.

Contributing

Contributions are welcome. Please open an issue for design discussion before large changes, and keep pull requests focused on a single subsystem when possible.

License

MIT. See LICENSE.

About

Hey Vox — local-first voice agent daemon for macOS

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages