# Triton Spark TTS + RVC Voice Conversion (Unified Container)

This notebook runs the full pipeline with **both services inside a single udocker container**:
1. **Triton Spark TTS** - Text-to-Speech (internal gRPC, port 8001)
2. **RVC Voice Conversion** - Voice cloning (inline processing)
3. **Voice HTTP API** - Unified endpoint (port 8003)

## Architecture
```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  udocker Container                                      ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê ‚îÇ
‚îÇ  ‚îÇ Triton Server   ‚îÇ‚óÑ‚îÄ‚îÄ‚îÄ‚îÇ Voice HTTP API (:8003)      ‚îÇ ‚îÇ
‚îÇ  ‚îÇ (Spark TTS)     ‚îÇ    ‚îÇ - FastAPI wrapper           ‚îÇ ‚îÇ
‚îÇ  ‚îÇ :8001 gRPC      ‚îÇ    ‚îÇ - TTS + RVC combined        ‚îÇ ‚îÇ
‚îÇ  ‚îÇ (internal)      ‚îÇ    ‚îÇ - N parallel RVC workers    ‚îÇ ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                   ‚ñ≤
                                   ‚îÇ HTTP
                                   ‚ñº
                            Host Python
                         (simple requests)
```

## Port Allocation
- `8000` - Triton HTTP (internal)
- `8001` - Triton gRPC (internal)
- `8002` - Triton Metrics (internal)
- `8003` - Voice HTTP API (external)

## API Endpoints
- `POST /synthesize/sse` - SSE streaming for web clients (JSON events with base64 audio)
- `POST /synthesize/stream` - Streaming multipart audio response
- `POST /tts` - TTS only
- `POST /rvc` - RVC only
- `GET /health` - Health check
- `GET /status` - Detailed status

## Benefits
- Simple HTTP interface (no gRPC complexity on host)
- Both services share CUDA context inside container
- Single container to manage
- No GPU memory fragmentation

## Requirements
- Google Colab with GPU runtime (T4 minimum, A100 recommended)
- ~12GB disk space for models and dependencies

---
## Section 1: Setup Environment

In [None]:
# Cell 1.1: Initialize udocker
import os

def udocker_init():
    if not os.path.exists("/home/user"):
        !pip install udocker > /dev/null
        !udocker --allow-root install > /dev/null
        !useradd -m user > /dev/null
    print('udocker initialized')

udocker_init()

In [None]:
# Cell 1.2: Clone repository
!git clone https://github.com/VSlobolinskyi/triton-spark-server.git
%cd triton-spark-server

In [None]:
# Cell 1.3: GPU Configuration for udocker
import subprocess

def detect_gpu_paths():
    paths = {}
    paths['nvidia_smi'] = subprocess.getoutput('which nvidia-smi')
    paths['cuda_dir'] = subprocess.getoutput("find /usr -path '*/cuda*' -type d -maxdepth 3 | grep -v 'targets' | head -1")
    paths['nvidia_lib_dir'] = subprocess.getoutput("find /usr -name 'libcuda.so*' -o -name 'libnvidia-ml.so*' | grep -v 'stubs' | xargs dirname | sort -u | head -1")
    paths['ld_library_path'] = f"{paths['nvidia_lib_dir']}:{paths['cuda_dir']}/lib64:{paths['cuda_dir']}/compat"

    basic_devices = ['/dev/nvidia0', '/dev/nvidiactl', '/dev/nvidia-uvm']

    volumes = [
        f"--volume={paths['nvidia_smi']}:{paths['nvidia_smi']}",
        f"--volume={paths['nvidia_lib_dir']}:{paths['nvidia_lib_dir']}",
        f"--volume={paths['cuda_dir']}:{paths['cuda_dir']}"
    ]

    for device in basic_devices:
        if os.path.exists(device):
            volumes.append(f"--volume={device}:{device}")

    env_vars = [
        f"--env=LD_LIBRARY_PATH={paths['ld_library_path']}:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:$LD_LIBRARY_PATH",
        f"--env=PATH=/opt/tritonserver/bin:{paths['cuda_dir']}/bin:/usr/bin:$PATH",
        "--env=NVIDIA_VISIBLE_DEVICES=all",
        "--env=NVIDIA_DRIVER_CAPABILITIES=compute,utility"
    ]

    return {
        'paths': paths,
        'volumes': volumes,
        'env_vars': env_vars,
        'all_options': ' '.join(volumes + env_vars)
    }

gpu_config = detect_gpu_paths()
print("GPU configuration ready")
for k, v in gpu_config['paths'].items():
    print(f"  {k}: {v}")

---
## Section 2: Setup Container with TTS + RVC

In [None]:
# Cell 2.1: Pull Triton Server image
!udocker --allow-root pull nvcr.io/nvidia/tritonserver:25.02-trtllm-python-py3
!udocker --allow-root create --name=voice_server nvcr.io/nvidia/tritonserver:25.02-trtllm-python-py3
!udocker --allow-root setup --nvidia voice_server

In [None]:
# Cell 2.2: Install TTS dependencies (torchaudio, etc.)
import os
pwd = os.getcwd()

tts_deps_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --volume={pwd}:/workspace \
    voice_server \
    /bin/bash -c "apt-get update && apt-get install -y cmake curl ffmpeg && \
git clone https://github.com/pytorch/audio.git && cd audio && git checkout c670ad8 && USE_FFMPEG=0 PATH=/usr/local/cuda/bin:\$PATH python3 setup.py develop && \
pip install einx==0.3.0 omegaconf==2.3.0 soundfile==0.12.1 soxr==0.5.0.post1 tritonclient librosa 'huggingface-hub>=0.24.0,<1.0'"
'''

!{tts_deps_cmd}

In [None]:
# Cell 2.3: Install RVC + HTTP API dependencies inside container
import os
pwd = os.getcwd()

rvc_deps_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --volume={pwd}:/workspace \
    voice_server \
    /bin/bash -c "pip install --no-cache-dir \
        faiss-cpu>=1.7.0 \
        pyworld==0.3.2 \
        praat-parselmouth>=0.4.2 \
        torchcrepe==0.0.23 \
        ffmpeg-python>=0.2.0 \
        av>=9.0.0 \
        onnxruntime-gpu>=1.13.0 \
        numba>=0.56.0 \
        joblib>=1.1.0 \
        fastapi>=0.100.0 \
        uvicorn>=0.21.1 \
        python-multipart>=0.0.6 \
        sse-starlette>=1.6.0 && \
    pip install --no-cache-dir git+https://github.com/One-sixth/fairseq.git"
'''

print("Installing RVC + HTTP API dependencies in container...")
!{rvc_deps_cmd}

In [None]:
# Cell 2.4: Build TensorRT-LLM engine for Spark TTS
import os
pwd = os.getcwd()

build_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --env=PYTHONPATH=/workspace \
    --volume={pwd}:/workspace \
    voice_server \
    /bin/bash -c "cd /workspace/triton && bash run.sh 0 2 offline"'''

print("Building TensorRT engine for Spark TTS...")
!{build_cmd}

---
## Section 3: Download Models

In [None]:
# Cell 3.1: Download RVC assets (HuBERT, RMVPE) - run inside container
import os
pwd = os.getcwd()

assets_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --env=PYTHONPATH=/workspace \
    --volume={pwd}:/workspace \
    voice_server \
    /bin/bash -c "cd /workspace && python3 tools/download_all_assets.py --rvc-only"'''

print("Downloading RVC assets...")
!{assets_cmd}

In [None]:
# Cell 3.2: Download RVC voice model
import os
pwd = os.getcwd()

RVC_MODEL_URL = "https://huggingface.co/Juneuarie/SilverWolfEN/resolve/main/SilverWolf.zip?download=true"

model_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --env=PYTHONPATH=/workspace \
    --volume={pwd}:/workspace \
    voice_server \
    /bin/bash -c "cd /workspace && python3 tools/download_all_assets.py --rvc-model '{RVC_MODEL_URL}'"'''

print("Downloading RVC voice model...")
!{model_cmd}

# List downloaded models and index files
print("\n=== Downloaded Files ===")
print("\nVoice models (.pth):")
!ls -la assets/weights/*.pth 2>/dev/null || echo "  No .pth files found in assets/weights/"

print("\nIndex files (.index) - REQUIRED for voice quality:")
!ls -la logs/*.index 2>/dev/null || echo "  WARNING: No .index files found in logs/"
!ls -la assets/*.index 2>/dev/null || echo ""

# Check if index exists
import glob
index_files = glob.glob("logs/*.index") + glob.glob("assets/*.index")
if not index_files:
    print("\n*** WARNING: No index file found! Voice quality enhancement will be disabled. ***")
    print("The RVC model zip should contain a .index file for best results.")

---
## Section 4: Start Voice Services

In [None]:
# Cell 4.1: Start Voice HTTP API (Triton + RVC inside container)
import subprocess
import time
import os

pwd = os.getcwd()

# Configuration
RVC_MODEL = "SilverWolf_e300_s6600.pth"
RVC_WORKERS = 2  # 2 for L4, up to 4 for A100
API_PORT = 8003  # Triton uses 8000-8002, so we use 8003

# Start services via startup script
# NOTE: RVC requires these environment variables for model loading:
#   - rmvpe_root: Path to rmvpe.pt for pitch extraction
#   - hubert_base_path: Path to hubert_base.pt for feature extraction
#   - weight_root: Path to RVC voice model weights
#   - index_root: Path to .index files for voice quality enhancement
server_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --env=PYTHONPATH=/workspace \
    --env=RVC_MODEL={RVC_MODEL} \
    --env=RVC_WORKERS={RVC_WORKERS} \
    --env=API_PORT={API_PORT} \
    --env=rmvpe_root=/workspace/assets/rmvpe \
    --env=hubert_base_path=/workspace/assets/hubert/hubert_base.pt \
    --env=weight_root=/workspace/assets/weights \
    --env=index_root=/workspace/logs \
    --volume={pwd}:/workspace \
    voice_server \
    /bin/bash /workspace/scripts/start_services.sh'''

# Run in background
process = subprocess.Popen(
    f"nohup {server_cmd} > services_log.txt 2>&1 &",
    shell=True
)

print("Starting voice services...")
print(f"  Voice HTTP API: http://localhost:{API_PORT}")
print(f"  RVC Workers: {RVC_WORKERS}")
print(f"  RVC Model: {RVC_MODEL}")
print("\nWaiting 150 seconds for services to initialize...")
time.sleep(150)

print("\nService logs:")
!tail -50 services_log.txt

In [None]:
# Cell 4.2: Install host Python dependencies (just requests + audio)
!pip install -q requests soundfile numpy

# Run this cell if you want to access the API from outside Colab (e.g., from your Angular frontend)
!pip install -q pyngrok

print("Host dependencies installed")

In [None]:
# Cell 4.3: Test connection to Voice HTTP API
import requests

API_URL = "http://localhost:8003"

print("Testing Voice HTTP API connection...")

# Health check
try:
    response = requests.get(f"{API_URL}/health", timeout=10)
    if response.status_code == 200:
        health = response.json()
        print(f"  Health: {health['status']}")
        print(f"  Triton ready: {health.get('triton_ready', 'N/A')}")
        print(f"  RVC ready: {health.get('rvc_ready', 'N/A')}")
    else:
        print(f"  Health check failed: {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"  Connection error: {e}")

# Detailed status
print("\nDetailed status:")
try:
    response = requests.get(f"{API_URL}/status", timeout=10)
    if response.status_code == 200:
        status = response.json()
        print(f"  RVC Model: {status.get('rvc_model', 'N/A')}")
        print(f"  RVC Workers: {status.get('rvc_workers', 'N/A')}")
        print(f"  Triton address: {status.get('triton_addr', 'N/A')}:{status.get('triton_port', 'N/A')}")
except requests.exceptions.RequestException as e:
    print(f"  Status check failed: {e}")

print("\nGPU memory usage:")
!nvidia-smi --query-gpu=memory.used,memory.total --format=csv

In [None]:
# Cell 4.4: Expose API externally via ngrok (OPTIONAL)

from pyngrok import ngrok
import os

# Set your authtoken (get it from https://dashboard.ngrok.com/get-started/your-authtoken)
ngrok.set_auth_token("YOUR_AUTHTOKEN_HERE")

API_PORT = 8003

# Create tunnel
tunnel = ngrok.connect(API_PORT, "http")
public_url = tunnel.public_url  # Extract just the URL string

print("=" * 60)
print("üåê API exposed externally!")
print(f"   Public URL: {public_url}")
print("=" * 60)
print(f'\nRun this in browser console:')
print(f'localStorage.setItem("apiUrl", "{public_url}")')
os.environ['PUBLIC_API_URL'] = public_url

---
## Section 5: Upload Reference Audio

In [None]:
# Cell 5.1: Upload reference audio
import os
import soundfile as sf
from google.colab import files

!mkdir -p references TEMP/output

print("Please upload a reference audio file (.wav format):")
uploaded = files.upload()

reference_audio = list(uploaded.keys())[0]
references_path = os.path.join("references", reference_audio)

with open(references_path, "wb") as f:
    f.write(uploaded[reference_audio])

audio, sr = sf.read(references_path)
print(f"Saved: {references_path} ({len(audio)/sr:.2f}s @ {sr}Hz)")

# Resample to 16kHz if needed
if sr != 16000:
    base_name = os.path.splitext(reference_audio)[0]
    reference_audio_16k = os.path.join("references", f"{base_name}_16k.wav")
    !ffmpeg -y -i "{references_path}" -ar 16000 "{reference_audio_16k}" -loglevel error
    print(f"Resampled to 16kHz: {reference_audio_16k}")
else:
    reference_audio_16k = references_path

---
## Section 6: Run Voice Synthesis

In [None]:
# Cell 6.1: Synthesize with SSE endpoint (full pipeline)
import requests
import time
import json
import io
import base64

API_URL = "http://localhost:8003"

# Configuration
TEXT = "Hello! This is a test of the unified voice synthesis pipeline. Both TTS and RVC are running inside a single container, exposed via a simple HTTP API."
REFERENCE_TEXT = ""

# RVC Parameters for quality enhancement
PITCH_SHIFT = 0           # Pitch shift in semitones (-12 to +12)
F0_METHOD = "rmvpe"       # Pitch extraction: rmvpe (best), pm, harvest, crepe
INDEX_RATE = 1.0          # Voice similarity (0.0-1.0), 1.0 = maximum quality
FILTER_RADIUS = 3         # Pitch smoothing (0-7), higher = smoother
RMS_MIX_RATE = 0.0        # Volume envelope: 0.0 = use TTS envelope (recommended)
PROTECT = 0.33            # Consonant protection (0.0-0.5), lower = more protection

print(f"Text: {TEXT[:60]}...")
print(f"Reference: {reference_audio_16k}")
print(f"Settings: pitch={PITCH_SHIFT}, f0={F0_METHOD}, index_rate={INDEX_RATE}")
print(f"Quality: filter_radius={FILTER_RADIUS}, rms_mix={RMS_MIX_RATE}, protect={PROTECT}")

# Prepare request
with open(reference_audio_16k, "rb") as f:
    ref_audio_data = f.read()

# Call SSE synthesis endpoint
print("\nRunning TTS + RVC synthesis via SSE...")
start_time = time.time()

response = requests.post(
    f"{API_URL}/synthesize/sse",
    data={
        "text": TEXT,
        "reference_text": REFERENCE_TEXT,
        "pitch_shift": PITCH_SHIFT,
        "f0_method": F0_METHOD,
        "index_rate": INDEX_RATE,
        "filter_radius": FILTER_RADIUS,
        "rms_mix_rate": RMS_MIX_RATE,
        "protect": PROTECT,
        "skip_rvc": False,
    },
    files={
        "reference_audio": ("reference.wav", ref_audio_data, "audio/wav")
    },
    stream=True,
    timeout=120
)

# Collect audio chunks
audio_chunks = []
total_tts_time = 0.0
total_rvc_time = 0.0

for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('data:'):
            data = json.loads(line_str[5:].strip())
            if data.get('type') == 'chunk':
                audio_chunks.append(base64.b64decode(data['data']))
                total_tts_time += data.get('tts_time', 0)
                total_rvc_time += data.get('rvc_time', 0)
                print(f"  Chunk {data['index']}: TTS {data.get('tts_time', 0):.2f}s, RVC {data.get('rvc_time', 0):.2f}s")

total_time = time.time() - start_time

if audio_chunks:
    # Combine chunks and save
    # Each chunk is a complete WAV file, we need to extract raw audio and combine
    import numpy as np
    
    combined_audio = []
    sample_rate = None
    
    for chunk_bytes in audio_chunks:
        audio, sr = sf.read(io.BytesIO(chunk_bytes))
        combined_audio.append(audio)
        sample_rate = sr
    
    final_audio = np.concatenate(combined_audio)
    output_path = "TEMP/output/synthesized.wav"
    sf.write(output_path, final_audio, sample_rate)
    
    duration = len(final_audio) / sample_rate
    
    print(f"\nSuccess!")
    print(f"  Total TTS time: {total_tts_time:.2f}s")
    print(f"  Total RVC time: {total_rvc_time:.2f}s")
    print(f"  Total time: {total_time:.2f}s")
    print(f"  Audio duration: {duration:.2f}s")
    print(f"  Sample rate: {sample_rate}Hz")
    print(f"  Saved to: {output_path}")
else:
    print(f"\nError: No audio chunks received")

In [None]:
# Cell 6.2: Play output audio
from IPython.display import Audio, display
import os

output_path = "TEMP/output/synthesized.wav"

if os.path.exists(output_path):
    print("Synthesized audio (TTS + RVC):")
    display(Audio(output_path))
else:
    print("No output file found. Run synthesis first.")

In [None]:
# Cell 6.3: TTS only (using /tts endpoint)
import requests
import time

API_URL = "http://localhost:8003"

TEXT = "This is TTS only, without voice conversion."

print(f"Text: {TEXT}")
print("Running TTS only via /tts endpoint...")

with open(reference_audio_16k, "rb") as f:
    ref_audio_data = f.read()

start_time = time.time()

response = requests.post(
    f"{API_URL}/tts",
    data={
        "text": TEXT,
        "reference_text": "",
    },
    files={
        "reference_audio": ("reference.wav", ref_audio_data, "audio/wav")
    },
    timeout=60
)

if response.status_code == 200:
    tts_time = float(response.headers.get("X-Processing-Time", 0))
    
    output_path = "TEMP/output/tts_only.wav"
    with open(output_path, "wb") as f:
        f.write(response.content)
    
    print(f"TTS time: {tts_time:.2f}s")
    print(f"Saved to: {output_path}")
    
    from IPython.display import Audio, display
    display(Audio(output_path))
else:
    print(f"Error: {response.status_code} - {response.text}")

---
## Section 7: Batch Processing

In [None]:
# Cell 7.1: Process multiple sentences via SSE
import requests
import time
import json
import base64
import io
import soundfile as sf
import numpy as np

API_URL = "http://localhost:8003"

LONG_TEXT = """
This is the first sentence of our test.
The second sentence demonstrates the SSE streaming approach.
Third sentence continues the voice synthesis test.
Fourth sentence shows the unified container working.
Fifth and final sentence completes our batch test.
""".strip()

print(f"Processing text via SSE endpoint...\n")
print(f"Text: {LONG_TEXT[:80]}...")

# Load reference audio once
with open(reference_audio_16k, "rb") as f:
    ref_audio_data = f.read()

total_start = time.time()

# Call SSE endpoint with full text (it will split into sentences internally)
response = requests.post(
    f"{API_URL}/synthesize/sse",
    data={
        "text": LONG_TEXT,
        "reference_text": "",
        "pitch_shift": PITCH_SHIFT,
        "f0_method": F0_METHOD,
        "index_rate": INDEX_RATE,
        "filter_radius": FILTER_RADIUS,
        "rms_mix_rate": RMS_MIX_RATE,
        "protect": PROTECT,
    },
    files={
        "reference_audio": ("reference.wav", ref_audio_data, "audio/wav")
    },
    stream=True,
    timeout=300
)

results = []
total_tts = 0.0
total_rvc = 0.0

for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('data:'):
            data = json.loads(line_str[5:].strip())
            if data.get('type') == 'start':
                print(f"Stream started: {data.get('total_chunks')} chunks expected")
            elif data.get('type') == 'chunk':
                audio_bytes = base64.b64decode(data['data'])
                audio, sr = sf.read(io.BytesIO(audio_bytes))
                results.append(audio)
                
                tts_time = data.get('tts_time', 0)
                rvc_time = data.get('rvc_time', 0)
                total_tts += tts_time
                total_rvc += rvc_time
                
                text_preview = data.get('text', '')[:40]
                print(f"  [{data['index']+1}] {text_preview}... TTS: {tts_time:.2f}s, RVC: {rvc_time:.2f}s")
            elif data.get('type') == 'end':
                print("Stream complete")

total_time = time.time() - total_start
print(f"\nTotal: {total_time:.2f}s for {len(results)} sentences ({total_time/max(len(results),1):.2f}s/sentence)")
print(f"  TTS total: {total_tts:.2f}s")
print(f"  RVC total: {total_rvc:.2f}s")

In [None]:
# Cell 7.2: Concatenate and play all sentences
import numpy as np
import soundfile as sf
from IPython.display import Audio, display

if results:
    # Get sample rate from first result file
    _, sr = sf.read("TEMP/output/sentence_0.wav")
    
    # Add small silence between sentences (at correct sample rate)
    silence = np.zeros(int(sr * 0.3))  # 300ms
    
    combined = []
    for i, audio in enumerate(results):
        combined.append(audio)
        if i < len(results) - 1:
            combined.append(silence)
    
    full_audio = np.concatenate(combined)
    output_path = "TEMP/output/combined.wav"
    sf.write(output_path, full_audio, sr)
    
    print(f"Combined audio: {len(full_audio)/sr:.2f}s @ {sr}Hz")
    display(Audio(output_path))
else:
    print("No results to combine.")

---
## Section 8: Advanced Options

In [None]:
# Cell 8.1: RVC only (convert existing audio)
import requests
import soundfile as sf
from IPython.display import Audio, display

API_URL = "http://localhost:8003"

# Use TTS output as input for RVC-only conversion
input_audio = "TEMP/output/tts_only.wav"

if os.path.exists(input_audio):
    print(f"Converting: {input_audio}")
    
    with open(input_audio, "rb") as f:
        audio_data = f.read()
    
    response = requests.post(
        f"{API_URL}/rvc",
        data={
            "pitch_shift": PITCH_SHIFT,
            "f0_method": F0_METHOD,
            "index_rate": INDEX_RATE,
            "filter_radius": FILTER_RADIUS,
            "rms_mix_rate": RMS_MIX_RATE,
            "protect": PROTECT,
        },
        files={
            "audio": ("input.wav", audio_data, "audio/wav")
        },
        timeout=60
    )
    
    if response.status_code == 200:
        output_path = "TEMP/output/rvc_only.wav"
        with open(output_path, "wb") as f:
            f.write(response.content)
        
        rvc_time = float(response.headers.get("X-RVC-Time", 0))
        print(f"RVC time: {rvc_time:.2f}s")
        print(f"Saved to: {output_path}")
        
        print("\nOriginal:")
        display(Audio(input_audio))
        print("\nRVC converted:")
        display(Audio(output_path))
    else:
        print(f"Error: {response.status_code} - {response.text}")
else:
    print(f"Input file not found: {input_audio}")
    print("Run Cell 6.3 (TTS only) first.")

In [None]:
# Cell 8.2: Benchmark - SSE Streaming Endpoint (matches frontend usage)
import requests
import time
import json
import soundfile as sf
from IPython.display import Audio, display

API_URL = "http://localhost:8003"

# Test texts of varying lengths
TEST_TEXTS = [
    "Short test.",
    "This is a medium length sentence for testing the voice synthesis pipeline.",
    "This is a longer test with multiple sentences. It should be split into chunks. Each chunk goes through TTS and RVC processing. The streaming endpoint sends each chunk as it's ready.",
]

print("=" * 60)
print("SSE STREAMING BENCHMARK")
print("=" * 60)

# Load reference audio
with open(reference_audio_16k, "rb") as f:
    ref_audio_data = f.read()

benchmark_results = []

for i, text in enumerate(TEST_TEXTS):
    print(f"\n--- Test {i+1}: {len(text)} chars, ~{len(text.split())} words ---")
    print(f"Text: {text[:50]}{'...' if len(text) > 50 else ''}")
    
    start_time = time.time()
    
    # Call SSE endpoint
    response = requests.post(
        f"{API_URL}/synthesize/sse",
        data={
            "text": text,
            "reference_text": "",
            "pitch_shift": 0,
            "f0_method": "rmvpe",
            "index_rate": 1.0,
        },
        files={
            "reference_audio": ("reference.wav", ref_audio_data, "audio/wav")
        },
        stream=True,
        timeout=120
    )
    
    chunks = []
    chunk_times = []
    first_chunk_time = None
    
    for line in response.iter_lines():
        if line:
            line_str = line.decode('utf-8')
            if line_str.startswith('data:'):
                data = json.loads(line_str[5:].strip())
                if data.get('type') == 'chunk':
                    if first_chunk_time is None:
                        first_chunk_time = time.time() - start_time
                    chunks.append(data)
                    chunk_times.append({
                        'tts_time': data.get('tts_time', 0),
                        'rvc_time': data.get('rvc_time', 0),
                    })
    
    total_time = time.time() - start_time
    
    # Calculate metrics
    total_tts = sum(c['tts_time'] for c in chunk_times)
    total_rvc = sum(c['rvc_time'] for c in chunk_times)
    
    result = {
        'text_length': len(text),
        'word_count': len(text.split()),
        'chunk_count': len(chunks),
        'first_chunk_time': first_chunk_time,
        'total_time': total_time,
        'total_tts_time': total_tts,
        'total_rvc_time': total_rvc,
    }
    benchmark_results.append(result)
    
    print(f"  Chunks: {len(chunks)}")
    print(f"  First chunk latency: {first_chunk_time:.2f}s")
    print(f"  Total TTS time: {total_tts:.2f}s")
    print(f"  Total RVC time: {total_rvc:.2f}s")
    print(f"  Total time: {total_time:.2f}s")

# Summary
print("\n" + "=" * 60)
print("BENCHMARK SUMMARY")
print("=" * 60)
print(f"{'Text Length':<15} {'Chunks':<10} {'1st Chunk':<12} {'TTS':<10} {'RVC':<10} {'Total':<10}")
print("-" * 60)
for r in benchmark_results:
    print(f"{r['text_length']:<15} {r['chunk_count']:<10} {r['first_chunk_time']:.2f}s{'':<6} {r['total_tts_time']:.2f}s{'':<4} {r['total_rvc_time']:.2f}s{'':<4} {r['total_time']:.2f}s")

# GPU memory
print("\n" + "=" * 60)
print("GPU MEMORY USAGE")
print("=" * 60)
!nvidia-smi --query-gpu=name,memory.used,memory.total,utilization.gpu --format=csv

In [None]:
# Cell 8.3: Save Benchmark Results (for documentation)
import os
import subprocess
from datetime import datetime

# Create test_results directory
os.makedirs("test_results", exist_ok=True)

# Get GPU name
gpu_name = subprocess.getoutput("nvidia-smi --query-gpu=name --format=csv,noheader").strip()

# Build benchmark markdown
benchmark_md = f"""# Voice Synthesis Benchmark Results

**Date:** {datetime.now().strftime('%Y-%m-%d %H:%M')}
**GPU:** {gpu_name}
**Configuration:**
- RVC Model: {RVC_MODEL}
- RVC Workers: {RVC_WORKERS}
- F0 Method: rmvpe
- Index Rate: 1.0

## SSE Streaming Endpoint Performance

| Text Length | Words | Chunks | First Chunk | TTS Time | RVC Time | Total |
|-------------|-------|--------|-------------|----------|----------|-------|
"""

for r in benchmark_results:
    benchmark_md += f"| {r['text_length']} | {r['word_count']} | {r['chunk_count']} | {r['first_chunk_time']:.2f}s | {r['total_tts_time']:.2f}s | {r['total_rvc_time']:.2f}s | {r['total_time']:.2f}s |\n"

benchmark_md += """
## Key Metrics

- **First Chunk Latency**: Time until first audio chunk is available for playback
- **TTS Time**: Spark TTS inference time (per sentence)
- **RVC Time**: Voice conversion time (per sentence)
- **Total Time**: End-to-end processing time

## GPU Memory Usage

```
"""

# Add GPU info
gpu_info = subprocess.getoutput("nvidia-smi --query-gpu=name,memory.used,memory.total,utilization.gpu --format=csv")
benchmark_md += gpu_info + "\n```\n"

# Save to file
with open("test_results/benchmarks.md", "w") as f:
    f.write(benchmark_md)

print("Benchmark results saved to: test_results/benchmarks.md")
print("\n" + benchmark_md)

---
## Section 9: Shutdown

In [None]:
# Cell 9.1: Shutdown services
import time

print("Shutting down services...")

# Kill processes inside container
!pkill -f tritonserver
!pkill -f "uvicorn.*voice_api"
!pkill -f "udocker.*voice_server"

time.sleep(3)

print("\nGPU status after shutdown:")
!nvidia-smi