v0.3.2 — Voice/Assistant Infrastructure
What's New
Voice/assistant infrastructure layer — everything needed to build voice apps like Mutter on top of Hive.
STT Providers
- WhisperLocal — mlx-whisper (Apple Silicon) or faster-whisper (Linux/CPU), auto-detected
- GroqSTT — Groq Whisper API with connection pooling and exponential backoff retry
- DeepgramSTT — Deepgram Nova-2 API with Pydantic response validation
- Factory with auto-detection:
create_stt_provider()picks the best available
AudioRecorder
- Microphone capture via sounddevice with thread-safe callbacks
- WAV output using stdlib struct (no scipy dependency)
- Device channel validation, proper memory cleanup
IntentRouter
- LLM-based text classification using
generate_structured()with Pydantic models - Falls back to text parsing if provider doesn't support structured output
- User-defined intents, configurable fallback
Trigger Systems
- HotkeyTrigger — global hotkeys via pynput with per-trigger in-flight guards
- WebhookTrigger — lightweight HTTP server using stdlib asyncio (no extra deps)
- Content-Length validation and 10MB cap for security
LinkToolkit
- Save, search, list, and scrape web links using SemanticMemory
- Async HTTP throughout (no event loop blocking)
- Wired into daemon's
_build_toolkits()
Hardening
- All data types are Pydantic BaseModel with validation
- Connection pooling with shared httpx.AsyncClient
- Exponential backoff retry on 429/500/502/503
transcribe_bytes()properly wraps raw PCM in WAV headers
New Optional Dependencies
pip install hive-agent[audio] # sounddevice, scipy, numpy
pip install hive-agent[hotkeys] # pynput
pip install hive-agent[voice] # all of the above
Whisper backends (mlx-whisper, faster-whisper) are manual install.
Stats
- 787 tests passing (127 new)
- 31 files changed, 3200+ lines added
- Stress tested: 50 concurrent webhooks, 5 concurrent STT calls, real mic recording, real hotkey listener
Full Changelog: v0.3.1...v0.3.2