# Text-to-Speech Client - Walkthrough

This notebook demonstrates how to use the TTS client (NVIDIA Magpie NIM).

1. **Health Check** - Verify TTS service is reachable
2. **List Voices** - See available voices
3. **Generate Speech** - Synthesize text to audio
4. **Save WAV to Disk** - Save the generated audio

## Setup

In [1]:
import sys
sys.path.insert(0, '/app')

import nest_asyncio
nest_asyncio.apply()

import asyncio
from pathlib import Path
from IPython.display import Audio, display

## 1. Health Check

First, let's verify the TTS service is running and reachable.

In [2]:
from src.clients.tts import (
    check_tts_health,
    list_voices,
    generate_speech,
    DEFAULT_TTS_URL,
)

async def check_health():
    return await check_tts_health()

is_healthy = asyncio.get_event_loop().run_until_complete(check_health())
print(f"TTS URL: {DEFAULT_TTS_URL}")
print(f"TTS healthy: {is_healthy}")

TTS URL: http://192.168.6.3:9000
TTS healthy: True


## 2. List Available Voices

See what voices are available from the TTS service.

In [3]:
async def get_voices():
    return await list_voices()

voices = asyncio.get_event_loop().run_until_complete(get_voices())
print(f"Available voices ({len(voices)}):")
for v in voices:
    print(f"  - {v}")

Available voices (1):
  - en-US,es-US,fr-FR,de-DE,zh-CN,vi-VN,it-IT


## 3. Generate Speech

Synthesize text into WAV audio.

In [5]:
text = "Welcome to waywo, the platform that showcases what makers are working on."

async def synthesize():
    return await generate_speech(
        text=text,
        language="en-US",
        sample_rate_hz=22050,
    )

wav_bytes = asyncio.get_event_loop().run_until_complete(synthesize())
print(f"Generated {len(wav_bytes)} bytes of audio")

# Play the audio inline
display(Audio(data=wav_bytes, rate=22050))

Generated 159788 bytes of audio


## 4. Save WAV to Disk

Save the generated audio to the media directory.

In [6]:
output_dir = Path("/app/media/tts_samples")
output_dir.mkdir(parents=True, exist_ok=True)

output_path = output_dir / "sample.wav"
output_path.write_bytes(wav_bytes)
print(f"Saved to: {output_path}")

Saved to: /app/media/tts_samples/sample.wav


## Next Steps

- See `src/clients/tts.py` for the full client API
- TTS output feeds into the STT client for word-level timestamps (see `notebooks/stt.ipynb`)
- The short-form video pipeline uses TTS audio durations to control image frame timing