# Voice Conversion GPU Testing - Simplified Version

**Focus**: Test the 3 most practical models
1. Seed-VC (easiest to setup)
2. RVC (most popular)
3. GPT-SoVITS (best quality)

**Time**: ~2 hours
**Cost**: FREE

## ‚ö†Ô∏è BEFORE STARTING:
1. Click **Runtime** ‚Üí **Change runtime type**
2. Select **GPU** (T4)
3. Click **Save**

In [None]:
# STEP 1: Verify GPU (MUST show CUDA: True)
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print("\n‚úÖ GPU is ready!")
else:
    print("\n‚ùå NO GPU! Go to Runtime ‚Üí Change runtime type ‚Üí Select GPU")
    raise Exception("GPU not enabled")

In [None]:
# STEP 2: Install common dependencies
!pip install -q librosa soundfile numpy matplotlib tqdm GPUtil
print("‚úÖ Dependencies installed")

In [None]:
# STEP 3: Create test audio
import numpy as np
import soundfile as sf
import os

os.makedirs('test_audio', exist_ok=True)
os.makedirs('results', exist_ok=True)

def gen_voice(f0, duration, path):
    sr = 16000
    t = np.linspace(0, duration, int(sr * duration))
    signal = sum(a * np.sin(2*np.pi*f0*i*t) for i, a in enumerate([1, 0.5, 0.3, 0.2], 1))
    signal = signal / np.max(np.abs(signal)) * 0.7
    sf.write(path, signal, sr)

gen_voice(120, 3, 'test_audio/male.wav')
gen_voice(220, 3, 'test_audio/female.wav')
print("‚úÖ Test audio created")

---
## Model 1: Seed-VC (Recommended - Easiest)

In [None]:
# Clone Seed-VC
!git clone https://github.com/Plachtaa/seed-vc.git
print("‚úÖ Seed-VC cloned")

In [None]:
# Install dependencies
%cd seed-vc
!pip install -q -r requirements.txt
print("‚úÖ Seed-VC dependencies installed")

In [None]:
# Download model (if link works)
!mkdir -p pretrained_models
!wget -q https://huggingface.co/Plachtaa/seed-vc/resolve/main/DiT_seed_v2_uvit_whisper_small_wavenet_bigvgan_pruned.pth -O pretrained_models/model.pth 2>/dev/null || echo "Download may require manual intervention - check Seed-VC repo"
print("Model download attempted")

In [None]:
# Test Seed-VC (manual testing if API not available)
print("\nüìù Seed-VC Testing:")
print("If automated testing fails, check seed-vc/README.md for manual inference instructions")
print("\nModel location: seed-vc/pretrained_models/")
print("Test audio: ../test_audio/")

# Add your testing code here based on Seed-VC documentation
%cd ..

---
## Model 2: RVC (Most Popular)

In [None]:
# Clone RVC
!git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git
print("‚úÖ RVC cloned")

In [None]:
# Install RVC
%cd Retrieval-based-Voice-Conversion-WebUI
!pip install -q -r requirements.txt
print("‚úÖ RVC dependencies installed")

In [None]:
# Download RVC models
!python download_models.py 2>/dev/null || echo "Check RVC repo for model download instructions"
print("Model download attempted")

In [None]:
# Test RVC
print("\nüìù RVC Testing:")
print("RVC typically requires:")
print("1. Pretrained models in assets/")
print("2. Training on your voice samples OR using pretrained voice models")
print("3. Check RVC-Project/Retrieval-based-Voice-Conversion-WebUI for detailed docs")

%cd ..

---
## Model 3: GPT-SoVITS (Best Quality)

In [None]:
# Clone GPT-SoVITS
!git clone https://github.com/RVC-Boss/GPT-SoVITS.git
print("‚úÖ GPT-SoVITS cloned")

In [None]:
# Install GPT-SoVITS
%cd GPT-SoVITS
!pip install -q -r requirements.txt
print("‚úÖ GPT-SoVITS dependencies installed")

In [None]:
# Download models (this takes time - 5-10 minutes)
!python download_models.py 2>/dev/null || echo "Check GPT-SoVITS repo for manual download"
print("Model download attempted (this may take 10+ minutes)")

In [None]:
# Test GPT-SoVITS
print("\nüìù GPT-SoVITS Testing:")
print("GPT-SoVITS requires:")
print("1. Pretrained models (GPT + SoVITS)")
print("2. Reference audio (5s-1min of target voice)")
print("3. Check RVC-Boss/GPT-SoVITS for API documentation")

%cd ..

---
## Summary and Next Steps

In [None]:
print("\n" + "="*60)
print("TESTING SUMMARY")
print("="*60)

print("\n‚úÖ Completed:")
print("  - GPU verification")
print("  - Repository cloning")
print("  - Dependency installation")
print("  - Test audio generation")

print("\nüìù Next Steps for Manual Testing:")
print("\n1. Seed-VC:")
print("   - Navigate to seed-vc/")
print("   - Read README.md")
print("   - Run inference examples")

print("\n2. RVC:")
print("   - Navigate to Retrieval-based-Voice-Conversion-WebUI/")
print("   - Follow inference_main.py examples")
print("   - Or use WebUI: python infer-web.py")

print("\n3. GPT-SoVITS:")
print("   - Navigate to GPT-SoVITS/")
print("   - Start API: python api.py")
print("   - Or WebUI: python webui.py")

print("\n" + "="*60)
print("\nFor detailed documentation, visit:")
print("https://github.com/MuruganR96/VoiceConversion_Survey")
print("\nSee: SERVER_SIDE_GPU_MODELS.md and SERVER_DEPLOYMENT_GUIDE.md")
print("="*60)

In [None]:
# List what was cloned
!ls -lh

---
## üìä Manual Benchmarking Template

Use this code to benchmark any model once you get it running:

In [None]:
import time
import torch
import numpy as np

def benchmark_model(conversion_function, input_audio, num_runs=5):
    """
    Benchmark any voice conversion function
    
    Args:
        conversion_function: Your model's inference function
        input_audio: Input audio array or path
        num_runs: Number of runs to average
    """
    
    # Warmup
    _ = conversion_function(input_audio)
    torch.cuda.empty_cache()
    
    times = []
    gpu_mem = []
    
    for i in range(num_runs):
        torch.cuda.reset_peak_memory_stats()
        
        start = time.perf_counter()
        output = conversion_function(input_audio)
        torch.cuda.synchronize()
        end = time.perf_counter()
        
        times.append((end - start) * 1000)  # ms
        gpu_mem.append(torch.cuda.max_memory_allocated() / 1e9)  # GB
        
        torch.cuda.empty_cache()
    
    print(f"\n{'='*50}")
    print(f"BENCHMARK RESULTS")
    print(f"{'='*50}")
    print(f"Latency: {np.mean(times):.2f} ¬± {np.std(times):.2f} ms")
    print(f"GPU Memory: {np.max(gpu_mem):.2f} GB (peak)")
    print(f"GPU Memory: {np.mean(gpu_mem):.2f} GB (average)")
    print(f"Real-time Factor: {np.mean(times) / 3000:.3f}")  # Assuming 3s audio
    print(f"{'='*50}\n")
    
    return {
        'latency_ms': np.mean(times),
        'latency_std': np.std(times),
        'gpu_memory_gb': np.max(gpu_mem),
        'rtf': np.mean(times) / 3000
    }

# Example usage:
# results = benchmark_model(your_model.convert, 'test_audio/male.wav')

print("‚úÖ Benchmarking template ready")
print("Use benchmark_model(your_function, input_audio) to measure performance")

---
## üíæ Save Your Results

After testing, document your findings:

In [None]:
# Template for saving results
import json
from datetime import datetime

results = {
    'timestamp': datetime.now().isoformat(),
    'gpu': torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU',
    'models': {
        'Seed-VC': {
            'tested': False,  # Change to True if you tested it
            'latency_ms': 0,  # Fill in your results
            'gpu_memory_gb': 0,
            'notes': 'Add your observations here'
        },
        'RVC': {
            'tested': False,
            'latency_ms': 0,
            'gpu_memory_gb': 0,
            'notes': ''
        },
        'GPT-SoVITS': {
            'tested': False,
            'latency_ms': 0,
            'gpu_memory_gb': 0,
            'notes': ''
        }
    }
}

# Save results
with open('results/my_test_results.json', 'w') as f:
    json.dump(results, f, indent=2)

print("‚úÖ Results template created: results/my_test_results.json")
print("Edit this file with your actual test results")

# Download results
from google.colab import files
files.download('results/my_test_results.json')