Skip to content

Add server-side TTS via Lemonade audio/speech endpoint #373

@kovtcharov

Description

@kovtcharov

Summary

Lemonade v9.4.1 exposes Kokoro TTS via an OpenAI-compatible POST /api/v1/audio/speech endpoint with streaming support. GAIA currently loads Kokoro locally in Python (src/gaia/audio/kokoro_tts.py). Server-side TTS enables streaming audio playback (lower latency) and is now available on both Windows and Linux.

Reference

API

async with client.audio.speech.with_streaming_response.create(
    model="kokoro-v1",
    voice="coral",
    input="Today is a wonderful day!",
    stream_format="audio",
) as response:
    await LocalAudioPlayer().play(response)

Parameters: input (text), model (kokoro-v1), voice, speed, response_format (mp3/wav/opus/pcm), stream_format

Changes Required

File Change Effort
src/gaia/audio/lemonade_tts.py New file. Create LemonadeTTS class wrapping /audio/speech. Implement generate_speech(text, voice, speed) with streaming support via AsyncOpenAI.audio.speech.with_streaming_response.create() Medium
src/gaia/audio/audio_client.py Add tts_backend config: "lemonade" (default) or "local" (fallback to KokoroTTS) Medium
src/gaia/talk/sdk.py Prefer Lemonade TTS when available; handle streaming audio playback Medium
src/gaia/cli.py Add `--tts-backend lemonade localflag togaia talk`
tests/unit/test_lemonade_tts.py Unit tests with mocked client Medium

Key Design Decisions

  • Streaming for talk: Use stream_format="audio" with PCM for lowest latency
  • File output: Use buffered response with WAV format for file saves
  • Voice mapping: Lemonade Kokoro voices (coral, etc.) may differ from GAIA's current voice list (af_bella, am_adam). Need to map or expose both.
  • Keep existing KokoroTTS: Same fallback strategy as ASR — auto-detect, graceful fallback.

Acceptance Criteria

  • LemonadeTTS class wraps Lemonade /audio/speech endpoint
  • Streaming audio playback works in gaia talk
  • AudioClient auto-detects Lemonade TTS availability
  • TalkSDK prefers Lemonade TTS, falls back to local Kokoro
  • --tts-backend CLI flag works
  • Existing KokoroTTS continues to work unchanged
  • Unit tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    audioAudio (ASR/TTS) changesdomain:multimodalVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAenhancementNew feature or requestlemonade 🍋p0high priorityperformancePerformance-critical changessdkSDK/framework changestalkTalk agent changestrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions