Skip to content

Extend TalkSDK and AudioClient with Lemonade ASR+TTS auto-detection #386

@kovtcharov

Description

@kovtcharov

Summary

Wire the new LemonadeASR (#372) and LemonadeTTS (#373) backends into the existing TalkSDK and AudioClient so that gaia talk seamlessly uses Lemonade server-side audio when available, with graceful fallback to local Whisper/Kokoro.

Current State

  • AudioClient (src/gaia/audio/audio_client.py) hardcodes local WhisperAsr and KokoroTTS
  • TalkSDK (src/gaia/talk/sdk.py) has no backend selection mechanism
  • TalkConfig has whisper_model_size but no asr_backend / tts_backend fields

Changes Required

File Change Effort
src/gaia/talk/sdk.py Add asr_backend: str = "auto" and tts_backend: str = "auto" to TalkConfig. "auto" checks Lemonade first, falls back to local. Also accepts "lemonade" or "local". Medium
src/gaia/audio/audio_client.py Add auto-detection logic: check Lemonade /health for websocket_port → if present, use LemonadeASR; check loaded models for kokoro-v1 → if present, use LemonadeTTS. Log which backend is active. Medium
src/gaia/cli.py Add --asr-backend and --tts-backend flags to gaia talk command Low
tests/unit/test_audio_backend_selection.py Test auto-detection logic with mocked health responses Medium
tests/integration/test_talk_lemonade.py End-to-end test with real Lemonade server Medium

Auto-Detection Logic

def _select_asr_backend(self):
    if self.config.asr_backend == "local":
        return WhisperAsr(...)
    # Try Lemonade first
    try:
        ws_port = self.lemonade_client.get_websocket_port()
        if ws_port:
            logger.info("Using Lemonade server-side ASR (Whisper.cpp)")
            return LemonadeASR(ws_port=ws_port, ...)
    except Exception:
        pass
    # Fallback
    logger.info("Using local Whisper ASR")
    return WhisperAsr(...)

Dependencies

Acceptance Criteria

  • gaia talk auto-detects and uses Lemonade ASR/TTS when server is running
  • Falls back to local Whisper/Kokoro when Lemonade is unavailable
  • --asr-backend and --tts-backend CLI flags override auto-detection
  • Backend selection logged clearly at startup
  • Existing gaia talk behavior unchanged when Lemonade is not running
  • Latency improvement measurable when using Lemonade backend

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentaudioAudio (ASR/TTS) changesdomain:multimodalVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAenhancementNew feature or requestlemonade 🍋p1medium priorityperformancePerformance-critical changestalkTalk agent changestrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions