# Triton Spark TTS + RVC Voice Conversion

This notebook runs the full pipeline:
1. **Triton Spark TTS** - Text-to-Speech (runs inside udocker container)
2. **RVC Voice Conversion** - Voice cloning (runs on host Python with CUDA)

## Architecture
- Triton Server runs Spark TTS with TensorRT-LLM optimization (~10x faster)
- RVC runs on host Python with direct CUDA access
- Both share GPU memory via cross-process CUDA
- Host Python communicates with Triton via gRPC (udocker shares host network)

## Requirements
- Google Colab with GPU runtime (T4 minimum, A100 recommended)
- ~10GB disk space for models

---
## Section 1: Setup Environment

In [None]:
# Cell 1.1: Initialize udocker for Triton container
def udocker_init():
    import os
    if not os.path.exists("/home/user"):
        !pip install udocker > /dev/null
        !udocker --allow-root install > /dev/null
        !useradd -m user > /dev/null
    print('Docker-in-Colab initialized')
    def execute(command: str):
        user_prompt = "\033[1;32muser@pc\033[0m"
        print(f"{user_prompt}$ udocker {command}")
        !su - user -c "udocker $command"
    return execute

udocker = udocker_init()

In [None]:
# Cell 1.2: Clone repository
!git clone https://github.com/VSlobolinskyi/triton-spark-server.git
%cd triton-spark-server
# Switch to feature branch with RVC integration
!git checkout feature/rvc-integration

In [None]:
# Cell 1.3: GPU Configuration
# Detect NVIDIA/CUDA paths and create proper volume/env mappings for udocker

def detect_gpu_paths():
    import os
    import subprocess

    # Detect critical paths
    paths = {}
    paths['nvidia_smi'] = subprocess.getoutput('which nvidia-smi')
    paths['cuda_dir'] = subprocess.getoutput("find /usr -path '*/cuda*' -type d -maxdepth 3 | grep -v 'targets' | head -1")
    paths['nvidia_lib_dir'] = subprocess.getoutput("find /usr -name 'libcuda.so*' -o -name 'libnvidia-ml.so*' | grep -v 'stubs' | xargs dirname | sort -u | head -1")
    paths['ld_library_path'] = f"{paths['nvidia_lib_dir']}:{paths['cuda_dir']}/lib64:{paths['cuda_dir']}/compat"

    # Get basic NVIDIA devices that we know work
    basic_devices = ['/dev/nvidia0', '/dev/nvidiactl', '/dev/nvidia-uvm']

    # Create volume mappings
    volumes = [
        f"--volume={paths['nvidia_smi']}:{paths['nvidia_smi']}",
        f"--volume={paths['nvidia_lib_dir']}:{paths['nvidia_lib_dir']}",
        f"--volume={paths['cuda_dir']}:{paths['cuda_dir']}"
    ]

    # Add only the basic device mappings that we know work
    for device in basic_devices:
        if os.path.exists(device):
            volumes.append(f"--volume={device}:{device}")

    # Create environment variables - add TensorRT path within container and include Triton Server path
    env_vars = [
        f"--env=LD_LIBRARY_PATH={paths['ld_library_path']}:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:$LD_LIBRARY_PATH",
        f"--env=PATH=/opt/tritonserver/bin:{paths['cuda_dir']}/bin:/usr/bin:$PATH",
        "--env=NVIDIA_VISIBLE_DEVICES=all",
        "--env=NVIDIA_DRIVER_CAPABILITIES=compute,utility"
    ]

    return {
        'paths': paths,
        'volumes': volumes,
        'env_vars': env_vars,
        'all_options': ' '.join(volumes + env_vars)
    }

# Get GPU configuration
gpu_config = detect_gpu_paths()

# Print the configuration
print("Detected NVIDIA/CUDA paths:")
for k, v in gpu_config['paths'].items():
    print(f"  {k}: {v}")

print("\nVolume mappings:")
for v in gpu_config['volumes']:
    print(f"  {v}")

print("\nEnvironment variables:")
for e in gpu_config['env_vars']:
    print(f"  {e}")

---
## Section 2: Setup Triton Container (Spark TTS)

In [None]:
# Cell 2.1: Pull Triton Server image
!udocker --allow-root pull nvcr.io/nvidia/tritonserver:25.02-trtllm-python-py3
!udocker --allow-root create --name=triton_server nvcr.io/nvidia/tritonserver:25.02-trtllm-python-py3
!udocker --allow-root setup --nvidia triton_server

In [None]:
# Cell 2.2: Install dependencies inside container
# Note: Repo is mounted at /workspace via --volume=$PWD:/workspace
import os
pwd = os.getcwd()

install_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --volume={pwd}:/workspace \
    triton_server \
    /bin/bash -c "apt-get update && apt-get install -y cmake && \
git clone https://github.com/pytorch/audio.git && cd audio && git checkout c670ad8 && USE_FFMPEG=0 PATH=/usr/local/cuda/bin:\$PATH python3 setup.py develop && \
pip install einx==0.3.0 omegaconf==2.3.0 soundfile==0.12.1 soxr==0.5.0.post1 tritonclient librosa 'huggingface-hub>=0.24.0,<1.0'"
'''

!{install_cmd}

In [None]:
# Cell 2.3: Build TensorRT-LLM engine for Spark TTS
# Stage 0: Download model (if not already downloaded in 2.2)
# Stage 1: Convert checkpoint to TensorRT weights
# Stage 2: Create model repository
import os
pwd = os.getcwd()

setup_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --env=PYTHONPATH=/workspace \
    --volume={pwd}:/workspace \
    triton_server \
    /bin/bash -c "cd /workspace/triton && bash run.sh 0 2 offline"'''

!{setup_cmd}

---
## Section 3: Setup RVC on Host Python

In [None]:
# Cell 3.1: Install RVC dependencies on host

# Install fairseq from git (required for HuBERT)
!pip install -q git+https://github.com/One-sixth/fairseq.git

# Install main requirements
!pip install -q -r requirements.txt

print("RVC dependencies installed")

In [None]:
# Cell 3.2: Download RVC assets (HuBERT, RMVPE)
!python tools/download_all_assets.py --rvc-only

In [None]:
# Cell 3.3: Download RVC voice model (SilverWolf example)
# You can replace this URL with your own RVC model
RVC_MODEL_URL = "https://huggingface.co/Juneuarie/SilverWolfEN/resolve/main/SilverWolf.zip?download=true"
!python tools/download_all_assets.py --rvc-model "{RVC_MODEL_URL}"

In [None]:
# Cell 3.4: Test RVC initialization
%cd /content/triton-spark-server

test_rvc_script = '''
import sys
sys.path.insert(0, ".")

from rvc import init_rvc, get_config

init_rvc()
config = get_config()
print(f"RVC initialized:")
print(f"  Device: {config.device}")
print(f"  Half precision: {config.is_half}")
print(f"  GPU: {config.gpu_name}")
'''

with open("test_rvc_init.py", "w") as f:
    f.write(test_rvc_script)

!python test_rvc_init.py

---
## Section 4: Start Triton Server

In [None]:
# Cell 4.1: Start Triton server in background
# Note: udocker shares host network, so Triton will be accessible at localhost:8001
import subprocess
import time
import os

pwd = os.getcwd()

server_cmd = f'''udocker --allow-root run \
    --hostauth --hostenv \
    {gpu_config['all_options']} \
    --env=PYTHONPATH=/workspace \
    --volume={pwd}:/workspace \
    triton_server \
    /bin/bash -c "cd /workspace/triton && tritonserver --model-repository=./model_repo_test"'''

# Run in background
process = subprocess.Popen(
    f"nohup {server_cmd} > server_log.txt 2>&1 &",
    shell=True
)

print("Triton server starting...")
print("Waiting 45 seconds for models to load...")
time.sleep(45)
!tail -30 server_log.txt

In [None]:
# Cell 4.2: Test Triton connection from host Python
!python tools/test_triton_connection.py --addr localhost --port 8001

---
## Section 5: Upload Reference Audio

In [None]:
# Cell 5.1: Upload reference audio for voice cloning
import os
import soundfile as sf
from google.colab import files

# Create directories
!mkdir -p references TEMP/tts TEMP/rvc TEMP/test

# Upload reference audio
print("Please upload a reference audio file (.wav format):")
uploaded = files.upload()

# Process uploaded file
reference_audio = list(uploaded.keys())[0]
references_path = os.path.join("references", reference_audio)

with open(references_path, "wb") as f:
    f.write(uploaded[reference_audio])
print(f"Saved to {references_path}")

# Check sample rate
audio, sr = sf.read(references_path)
print(f"Original: {len(audio)/sr:.2f}s @ {sr}Hz")

if sr != 16000:
    # Use ffmpeg for resampling (available in Colab)
    base_name = os.path.splitext(reference_audio)[0]
    reference_audio_16k = os.path.join("references", f"{base_name}_16k.wav")
    !ffmpeg -y -i "{references_path}" -ar 16000 "{reference_audio_16k}" -loglevel error
    print(f"Resampled to 16kHz: {reference_audio_16k}")
else:
    reference_audio_16k = references_path
    print(f"Already 16kHz, using: {reference_audio_16k}")


---
## Section 6: Run Full Pipeline

In [None]:
# Cell 6.1: Configuration
# Modify these settings as needed

# Text to synthesize
TARGET_TEXT = "Hello! This is a test of the Triton Spark TTS and RVC voice conversion pipeline. The voice should sound like the reference audio."

# Reference text (can be empty for offline mode)
REFERENCE_TEXT = ""

# RVC model name (check assets/weights/ for available models)
# The SilverWolf model extracts as SilverWolf_e300_s6600.pth
RVC_MODEL = "SilverWolf_e300_s6600.pth"

# Pitch shift in semitones (0 = no change)
PITCH_SHIFT = 0

# F0 extraction method (rmvpe is best quality)
F0_METHOD = "rmvpe"

print(f"Configuration:")
print(f"  Text: {TARGET_TEXT[:50]}...")
print(f"  RVC Model: {RVC_MODEL}")
print(f"  Pitch Shift: {PITCH_SHIFT}")
print(f"  F0 Method: {F0_METHOD}")

# List available models
print(f"\nAvailable models in assets/weights/:")
!ls -la assets/weights/ 2>/dev/null || echo "  (directory not yet created)"

In [None]:
# Cell 6.2: Run full pipeline test
import subprocess

cmd = [
    "python", "tools/test_pipeline.py",
    "--triton-addr", "localhost",
    "--triton-port", "8001",
    "--prompt-audio", reference_audio_16k,
    "--prompt-text", REFERENCE_TEXT,
    "--text", TARGET_TEXT,
    "--rvc-model", RVC_MODEL,
    "--pitch-shift", str(PITCH_SHIFT),
    "--f0-method", F0_METHOD,
    "--output-dir", "./TEMP/test"
]

result = subprocess.run(cmd)
print(f"\nPipeline finished with code: {result.returncode}")

In [None]:
# Cell 6.3: Play output audio
from IPython.display import Audio, display
import os

tts_path = "TEMP/test/tts_output.wav"
rvc_path = "TEMP/test/rvc_output.wav"

if os.path.exists(tts_path):
    print("TTS Output (before RVC):")
    display(Audio(tts_path))
else:
    print(f"TTS output not found at {tts_path}")

if os.path.exists(rvc_path):
    print("\nRVC Output (final voice):")
    display(Audio(rvc_path))
else:
    print(f"RVC output not found at {rvc_path}")

---
## Section 7: RVC Server with Parallel Workers

This section uses the RVC Server architecture with multiple worker **processes** for true parallel processing.
Each worker process has its own copy of the RVC model in GPU memory, bypassing Python's GIL for maximum throughput.

In [None]:
# Cell 7.1: Start RVC Server (Persistent Background Process)
# This starts the RVC server daemon that persists until manually stopped.
# Similar to how Triton server runs in background (Cell 4.1).

import subprocess
import time
import os

# Configuration
NUM_RVC_WORKERS = 2  # 2 for T4, up to 4 for A100
RVC_MODEL = "SilverWolf_e300_s6600.pth"

print(f"Starting RVC server daemon...")
print(f"  Model: {RVC_MODEL}")
print(f"  Workers: {NUM_RVC_WORKERS}")

# Start daemon
!python tools/rvc_server_control.py start --model "{RVC_MODEL}" --workers {NUM_RVC_WORKERS} --timeout 120

# Check status
print("\nServer status:")
!python tools/rvc_server_control.py status

# Show GPU memory
print("\nGPU memory usage:")
!nvidia-smi --query-gpu=memory.used,memory.total --format=csv

In [None]:
# Cell 7.2: Configure Pipeline Test
# Large text chunk to test parallel processing (10 sentences)
LARGE_TEXT = """
Hello! This is the first sentence of our test. It should be processed by the TTS worker first.
The second sentence comes next, demonstrating the parallel processing capabilities.
Now we have a third sentence to add more work for the RVC workers.
Fourth sentence here, which will be queued for voice conversion.
The fifth sentence continues our comprehensive test of the pipeline.
Sentence number six is being processed through the system.
Here comes the seventh sentence for additional testing.
The eighth sentence adds more content to our test batch.
Ninth sentence in our sequence of test utterances.
And finally, the tenth sentence completes our test batch.
""".strip()

# Pitch shift in semitones (0 = no change)
PITCH_SHIFT = 0

# F0 extraction method (rmvpe is best quality)
F0_METHOD = "rmvpe"

print(f"Pipeline Test Configuration:")
print(f"  Sentences: {len([s for s in LARGE_TEXT.split('.') if s.strip()])}")
print(f"  Pitch Shift: {PITCH_SHIFT}")
print(f"  F0 Method: {F0_METHOD}")

In [None]:
# Cell 7.3: Run Pipeline Test (uses existing RVC server)
# The pipeline will detect and use the RVC server started in Cell 7.1
# Since the server is already running, this should be fast (~0.5s per fragment for RVC)

!python tools/test_rvc_server.py \
    --triton-addr localhost \
    --triton-port 8001 \
    --prompt-audio "{reference_audio_16k}" \
    --text "{LARGE_TEXT}" \
    --pitch-shift {PITCH_SHIFT} \
    --f0-method {F0_METHOD}

In [None]:
# Cell 7.4: Shutdown Servers (Free GPU Memory)
# RVC server persists between runs for faster subsequent processing.
# Run this cell to manually shutdown and free GPU memory.

# Shutdown RVC server
print("Shutting down RVC server...")
!python tools/rvc_server_control.py stop

# Shutdown Triton server (Spark TTS)
print("\nShutting down Triton server...")
!pkill -f tritonserver

# Wait for processes to terminate
import time
time.sleep(2)

# Verify GPU memory is cleared
print("\nGPU status after shutdown:")
!nvidia-smi