Skip to content

v0.3.2 — Voice/Assistant Infrastructure

Choose a tag to compare

@chiruu12 chiruu12 released this 23 May 18:00
· 101 commits to main since this release
b263935

What's New

Voice/assistant infrastructure layer — everything needed to build voice apps like Mutter on top of Hive.

STT Providers

  • WhisperLocal — mlx-whisper (Apple Silicon) or faster-whisper (Linux/CPU), auto-detected
  • GroqSTT — Groq Whisper API with connection pooling and exponential backoff retry
  • DeepgramSTT — Deepgram Nova-2 API with Pydantic response validation
  • Factory with auto-detection: create_stt_provider() picks the best available

AudioRecorder

  • Microphone capture via sounddevice with thread-safe callbacks
  • WAV output using stdlib struct (no scipy dependency)
  • Device channel validation, proper memory cleanup

IntentRouter

  • LLM-based text classification using generate_structured() with Pydantic models
  • Falls back to text parsing if provider doesn't support structured output
  • User-defined intents, configurable fallback

Trigger Systems

  • HotkeyTrigger — global hotkeys via pynput with per-trigger in-flight guards
  • WebhookTrigger — lightweight HTTP server using stdlib asyncio (no extra deps)
  • Content-Length validation and 10MB cap for security

LinkToolkit

  • Save, search, list, and scrape web links using SemanticMemory
  • Async HTTP throughout (no event loop blocking)
  • Wired into daemon's _build_toolkits()

Hardening

  • All data types are Pydantic BaseModel with validation
  • Connection pooling with shared httpx.AsyncClient
  • Exponential backoff retry on 429/500/502/503
  • transcribe_bytes() properly wraps raw PCM in WAV headers

New Optional Dependencies

pip install hive-agent[audio]    # sounddevice, scipy, numpy
pip install hive-agent[hotkeys]  # pynput
pip install hive-agent[voice]    # all of the above

Whisper backends (mlx-whisper, faster-whisper) are manual install.

Stats

  • 787 tests passing (127 new)
  • 31 files changed, 3200+ lines added
  • Stress tested: 50 concurrent webhooks, 5 concurrent STT calls, real mic recording, real hotkey listener

Full Changelog: v0.3.1...v0.3.2