-
Notifications
You must be signed in to change notification settings - Fork 92
Add server-side TTS via Lemonade audio/speech endpoint #373
Copy link
Copy link
Open
Labels
audioAudio (ASR/TTS) changesAudio (ASR/TTS) changesdomain:multimodalVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAenhancementNew feature or requestNew feature or requestlemonade 🍋p0high priorityhigh priorityperformancePerformance-critical changesPerformance-critical changessdkSDK/framework changesSDK/framework changestalkTalk agent changesTalk agent changestrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skillsHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills
Metadata
Metadata
Assignees
Labels
audioAudio (ASR/TTS) changesAudio (ASR/TTS) changesdomain:multimodalVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAVoice (ASR/TTS), Vision (VLM), Image gen (SD), CUAenhancementNew feature or requestNew feature or requestlemonade 🍋p0high priorityhigh priorityperformancePerformance-critical changesPerformance-critical changessdkSDK/framework changesSDK/framework changestalkTalk agent changesTalk agent changestrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skillsHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills
Summary
Lemonade v9.4.1 exposes Kokoro TTS via an OpenAI-compatible
POST /api/v1/audio/speechendpoint with streaming support. GAIA currently loads Kokoro locally in Python (src/gaia/audio/kokoro_tts.py). Server-side TTS enables streaming audio playback (lower latency) and is now available on both Windows and Linux.Reference
API
Parameters:
input(text),model(kokoro-v1),voice,speed,response_format(mp3/wav/opus/pcm),stream_formatChanges Required
src/gaia/audio/lemonade_tts.pyLemonadeTTSclass wrapping/audio/speech. Implementgenerate_speech(text, voice, speed)with streaming support viaAsyncOpenAI.audio.speech.with_streaming_response.create()src/gaia/audio/audio_client.pytts_backendconfig:"lemonade"(default) or"local"(fallback to KokoroTTS)src/gaia/talk/sdk.pysrc/gaia/cli.pyflag togaia talk`tests/unit/test_lemonade_tts.pyKey Design Decisions
stream_format="audio"with PCM for lowest latencyAcceptance Criteria
LemonadeTTSclass wraps Lemonade/audio/speechendpointgaia talkAudioClientauto-detects Lemonade TTS availabilityTalkSDKprefers Lemonade TTS, falls back to local Kokoro--tts-backendCLI flag worksKokoroTTScontinues to work unchanged