# VoiRS Python Bindings - Basic Usage Tutorial

This notebook demonstrates the basic usage of VoiRS Python bindings for text-to-speech synthesis.

## Prerequisites

Make sure you have VoiRS Python bindings installed:

```bash
pip install voirs-ffi
```

## Table of Contents

1. [Basic Setup](#basic-setup)
2. [Simple Text-to-Speech](#simple-text-to-speech)
3. [Working with Voices](#working-with-voices)
4. [Audio Processing](#audio-processing)
5. [Configuration Options](#configuration-options)
6. [Error Handling](#error-handling)
7. [Performance Tips](#performance-tips)

## 1. Basic Setup

Let's start by importing the necessary modules and checking our system compatibility.

In [None]:
# Import VoiRS modules
from voirs_ffi import VoirsPipeline, check_compatibility, synthesize_text
import json

# Check system compatibility
compatibility_info = check_compatibility()
print("System Compatibility:")
print(json.dumps(compatibility_info, indent=2))

## 2. Simple Text-to-Speech

Let's create our first speech synthesis example.

In [None]:
# Create a pipeline with default settings
pipeline = VoirsPipeline()

# Synthesize some text
text = "Hello, world! This is my first text-to-speech synthesis with VoiRS."
audio = pipeline.synthesize(text)

# Display audio properties
print(f"Audio duration: {audio.duration:.2f} seconds")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Channels: {audio.channels}")
print(f"Total samples: {len(audio.samples)}")

In [None]:
# Play the audio (if audio output is available)
try:
    audio.play()
    print("Audio played successfully!")
except Exception as e:
    print(f"Could not play audio: {e}")
    print("This is normal in environments without audio output.")

In [None]:
# Save the audio to a file
audio.save("first_synthesis.wav")
print("Audio saved to 'first_synthesis.wav'")

# You can also save in different formats
audio.save("first_synthesis.mp3", format="mp3")
print("Audio also saved as MP3")

### Using Convenience Functions

For quick synthesis, you can use the convenience function:

In [None]:
# Quick synthesis without creating a pipeline object
quick_audio = synthesize_text("This is a quick synthesis example!")
print(f"Quick synthesis duration: {quick_audio.duration:.2f} seconds")
quick_audio.save("quick_synthesis.wav")

## 3. Working with Voices

VoiRS supports multiple voices. Let's explore the available voices and how to use them.

In [None]:
# Get all available voices
voices = pipeline.get_voices()

print(f"Available voices: {len(voices)}")
print("\nVoice Details:")
print("-" * 50)

for voice in voices:
    print(f"Name: {voice.name}")
    print(f"ID: {voice.id}")
    print(f"Language: {voice.language}")
    print(f"Gender: {voice.gender}")
    print(f"Age: {voice.age}")
    print("-" * 50)

In [None]:
# Try different voices with the same text
test_text = "This is a test of different voice characteristics."

# Use different voices
for i, voice in enumerate(voices[:3]):  # Try first 3 voices
    print(f"Synthesizing with voice: {voice.name}")
    
    # Synthesize with specific voice
    voice_audio = pipeline.synthesize(test_text, voice=voice.id)
    
    # Save with voice-specific filename
    filename = f"voice_{i}_{voice.id}.wav"
    voice_audio.save(filename)
    print(f"Saved to: {filename}")
    
    # Play audio (if available)
    try:
        voice_audio.play()
    except:
        pass  # Skip if no audio output
    
    print()

In [None]:
# Set a default voice for the pipeline
if voices:
    favorite_voice = voices[0]  # Choose first voice
    pipeline.set_voice(favorite_voice.id)
    print(f"Default voice set to: {favorite_voice.name}")
    
    # Now all synthesis will use this voice by default
    default_audio = pipeline.synthesize("This uses the default voice.")
    default_audio.save("default_voice.wav")
    print("Audio with default voice saved.")

## 4. Audio Processing

VoiRS provides several audio processing capabilities.

In [None]:
# Create audio for processing
process_text = "This audio will be processed in various ways."
original_audio = pipeline.synthesize(process_text)

print("Original Audio Properties:")
print(f"Duration: {original_audio.duration:.2f} seconds")
print(f"Sample rate: {original_audio.sample_rate} Hz")
print(f"Channels: {original_audio.channels}")
print(f"Peak amplitude: {max(abs(s) for s in original_audio.samples):.4f}")

# Save original
original_audio.save("original_audio.wav")

In [None]:
# Apply gain (volume adjustment)
import copy

# Create copies for different processing
loud_audio = copy.deepcopy(original_audio)
quiet_audio = copy.deepcopy(original_audio)

# Make it louder (2x volume)
loud_audio.apply_gain(2.0)
loud_audio.save("loud_audio.wav")
print("Louder audio saved (2x gain)")

# Make it quieter (0.5x volume)
quiet_audio.apply_gain(0.5)
quiet_audio.save("quiet_audio.wav")
print("Quieter audio saved (0.5x gain)")

# Check peak amplitudes
print(f"Original peak: {max(abs(s) for s in original_audio.samples):.4f}")
print(f"Loud peak: {max(abs(s) for s in loud_audio.samples):.4f}")
print(f"Quiet peak: {max(abs(s) for s in quiet_audio.samples):.4f}")

In [None]:
# Normalize audio to prevent clipping
normalized_audio = copy.deepcopy(loud_audio)  # Use the loud audio
normalized_audio.normalize()
normalized_audio.save("normalized_audio.wav")

print("Audio normalized to prevent clipping")
print(f"Peak after normalization: {max(abs(s) for s in normalized_audio.samples):.4f}")

In [None]:
# Advanced processing with NumPy (if available)
try:
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Convert to NumPy array
    audio_array = original_audio.to_numpy()
    
    print(f"Audio array shape: {audio_array.shape}")
    print(f"Audio array dtype: {audio_array.dtype}")
    
    # Plot waveform
    plt.figure(figsize=(12, 4))
    time_axis = np.linspace(0, original_audio.duration, len(audio_array))
    plt.plot(time_axis, audio_array)
    plt.title("Audio Waveform")
    plt.xlabel("Time (seconds)")
    plt.ylabel("Amplitude")
    plt.grid(True)
    plt.show()
    
    # Calculate RMS (Root Mean Square)
    rms = np.sqrt(np.mean(audio_array ** 2))
    print(f"RMS amplitude: {rms:.4f}")
    
    # Calculate zero-crossing rate
    zero_crossings = np.where(np.diff(np.signbit(audio_array)))[0]
    zero_crossing_rate = len(zero_crossings) / len(audio_array)
    print(f"Zero-crossing rate: {zero_crossing_rate:.4f}")
    
except ImportError:
    print("NumPy and/or matplotlib not available. Skipping advanced processing.")

## 5. Configuration Options

VoiRS provides various configuration options to customize the synthesis process.

In [None]:
# Create pipeline with custom configuration
from voirs_ffi import SynthesisConfig

# Create configuration object
config = SynthesisConfig(
    sample_rate=44100,      # High quality sample rate
    use_gpu=True,           # Use GPU if available
    num_threads=4,          # Number of CPU threads
    language="en-US",       # Language setting
    quality="high"          # Quality level
)

# Create pipeline with configuration
high_quality_pipeline = VoirsPipeline(config)

# Test synthesis
hq_text = "This is high-quality synthesis with custom configuration."
hq_audio = high_quality_pipeline.synthesize(hq_text)

print(f"High-quality audio sample rate: {hq_audio.sample_rate} Hz")
print(f"High-quality audio duration: {hq_audio.duration:.2f} seconds")
hq_audio.save("high_quality.wav")

In [None]:
# Alternative: Use fluent configuration API
fluent_pipeline = VoirsPipeline.with_config(
    sample_rate=22050,
    use_gpu=compatibility_info['gpu'],  # Use GPU if available
    num_threads=2,
    quality="medium"
)

# Test different quality levels
test_text = "Comparing different quality levels."

# Low quality (fast)
low_pipeline = VoirsPipeline.with_config(quality="low", sample_rate=16000)
low_audio = low_pipeline.synthesize(test_text)
low_audio.save("low_quality.wav")

# Medium quality (balanced)
medium_audio = fluent_pipeline.synthesize(test_text)
medium_audio.save("medium_quality.wav")

# High quality (best)
high_audio = high_quality_pipeline.synthesize(test_text)
high_audio.save("high_quality.wav")

print("Quality comparison files saved:")
print(f"Low quality: {low_audio.sample_rate} Hz, {low_audio.duration:.2f}s")
print(f"Medium quality: {medium_audio.sample_rate} Hz, {medium_audio.duration:.2f}s")
print(f"High quality: {high_audio.sample_rate} Hz, {high_audio.duration:.2f}s")

## 6. Error Handling

Proper error handling is important for robust applications.

In [None]:
# Import error types
from voirs_ffi import VoirsError, SynthesisError, VoiceNotFoundError

# Test basic error handling
try:
    # This should work fine
    success_audio = pipeline.synthesize("This should work.")
    print("Synthesis successful!")
    
except VoirsError as e:
    print(f"VoiRS error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

In [None]:
# Test voice-related errors
try:
    # Try to use a non-existent voice
    pipeline.set_voice("nonexistent-voice-id")
    error_audio = pipeline.synthesize("This might fail.")
    
except VoiceNotFoundError as e:
    print(f"Voice not found: {e}")
except SynthesisError as e:
    print(f"Synthesis error: {e}")
except VoirsError as e:
    print(f"General VoiRS error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

In [None]:
# Robust synthesis function
def robust_synthesize(pipeline, text, voice=None, max_retries=3):
    """Robust synthesis with retry logic"""
    
    for attempt in range(max_retries):
        try:
            audio = pipeline.synthesize(text, voice=voice)
            return audio
            
        except VoiceNotFoundError:
            print(f"Voice '{voice}' not found. Trying with default voice.")
            voice = None  # Use default voice
            
        except SynthesisError as e:
            print(f"Synthesis error on attempt {attempt + 1}: {e}")
            if attempt == max_retries - 1:
                raise
            
        except Exception as e:
            print(f"Unexpected error on attempt {attempt + 1}: {e}")
            if attempt == max_retries - 1:
                raise
    
    return None

# Test robust synthesis
robust_audio = robust_synthesize(
    pipeline, 
    "This is a robust synthesis test.", 
    voice="maybe-nonexistent-voice"
)

if robust_audio:
    robust_audio.save("robust_synthesis.wav")
    print("Robust synthesis successful!")
else:
    print("Robust synthesis failed after all retries.")

## 7. Performance Tips

Here are some tips for optimal performance with VoiRS.

In [None]:
import time

# Function to measure synthesis time
def measure_synthesis_time(pipeline, text, num_runs=5):
    """Measure average synthesis time"""
    times = []
    
    # Warmup
    pipeline.synthesize("warmup")
    
    # Measure
    for i in range(num_runs):
        start_time = time.time()
        audio = pipeline.synthesize(text)
        end_time = time.time()
        
        synthesis_time = end_time - start_time
        times.append(synthesis_time)
        
        print(f"Run {i+1}: {synthesis_time:.3f}s")
    
    avg_time = sum(times) / len(times)
    print(f"\nAverage synthesis time: {avg_time:.3f}s")
    print(f"Audio duration: {audio.duration:.2f}s")
    print(f"Real-time factor: {avg_time / audio.duration:.2f}x")
    
    return avg_time

# Test performance with different configurations
test_text = "This is a performance test with a reasonable amount of text to synthesize."

print("Performance Test - Default Configuration:")
default_time = measure_synthesis_time(pipeline, test_text)

print("\nPerformance Test - GPU Configuration:")
gpu_pipeline = VoirsPipeline.with_config(use_gpu=True, num_threads=4)
gpu_time = measure_synthesis_time(gpu_pipeline, test_text)

print("\nPerformance Test - Low Quality (Fast):")
fast_pipeline = VoirsPipeline.with_config(quality="low", sample_rate=16000)
fast_time = measure_synthesis_time(fast_pipeline, test_text)

# Performance comparison
print("\nPerformance Comparison:")
print(f"Default: {default_time:.3f}s")
print(f"GPU: {gpu_time:.3f}s ({gpu_time/default_time:.2f}x relative)")
print(f"Fast: {fast_time:.3f}s ({fast_time/default_time:.2f}x relative)")

In [None]:
# Batch processing for better performance
def batch_synthesis_demo():
    """Demonstrate batch processing"""
    
    texts = [
        "First sentence for batch processing.",
        "Second sentence in the batch.",
        "Third sentence for comparison.",
        "Fourth and final sentence."
    ]
    
    # Method 1: Individual synthesis
    print("Individual Synthesis:")
    start_time = time.time()
    
    individual_audios = []
    for i, text in enumerate(texts):
        audio = pipeline.synthesize(text)
        individual_audios.append(audio)
        audio.save(f"individual_{i}.wav")
    
    individual_time = time.time() - start_time
    print(f"Individual synthesis time: {individual_time:.3f}s")
    
    # Method 2: Batch synthesis (if available)
    print("\nBatch Synthesis:")
    start_time = time.time()
    
    try:
        # Try batch synthesis (may not be available in all versions)
        batch_audios = pipeline.synthesize_batch(texts)
        
        for i, audio in enumerate(batch_audios):
            audio.save(f"batch_{i}.wav")
        
        batch_time = time.time() - start_time
        print(f"Batch synthesis time: {batch_time:.3f}s")
        print(f"Speedup: {individual_time / batch_time:.2f}x")
        
    except AttributeError:
        print("Batch synthesis not available in this version.")
    
    return individual_audios

# Run batch demo
batch_audios = batch_synthesis_demo()

In [None]:
# Memory optimization tips
import gc
import psutil
import os

def get_memory_usage():
    """Get current memory usage"""
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / 1024 / 1024  # MB

def memory_efficient_synthesis(texts):
    """Demonstrate memory-efficient synthesis"""
    
    print(f"Initial memory usage: {get_memory_usage():.1f} MB")
    
    # Create pipeline with memory-efficient settings
    efficient_pipeline = VoirsPipeline.with_config(
        use_gpu=True,
        num_threads=2,  # Fewer threads to reduce memory
        quality="medium"
    )
    
    # Process texts one by one and clean up
    for i, text in enumerate(texts):
        print(f"\nProcessing text {i+1}/{len(texts)}...")
        
        # Synthesize
        audio = efficient_pipeline.synthesize(text)
        
        # Save immediately
        audio.save(f"efficient_{i}.wav")
        
        # Clean up
        del audio
        gc.collect()
        
        print(f"Memory usage: {get_memory_usage():.1f} MB")
    
    print(f"\nFinal memory usage: {get_memory_usage():.1f} MB")

# Test memory efficiency
memory_test_texts = [
    "This is a memory efficiency test with longer text to see how memory usage changes during synthesis.",
    "Second text for memory testing. We want to make sure memory doesn't grow indefinitely.",
    "Third text in the memory test sequence. Memory management is crucial for long-running applications."
]

try:
    memory_efficient_synthesis(memory_test_texts)
except ImportError:
    print("psutil not available. Skipping memory optimization demo.")
    print("Install with: pip install psutil")

## Conclusion

This tutorial covered the basic usage of VoiRS Python bindings, including:

1. **Basic Setup**: System compatibility checking and imports
2. **Simple Synthesis**: Creating audio from text
3. **Voice Management**: Working with different voices
4. **Audio Processing**: Manipulating audio properties
5. **Configuration**: Customizing synthesis settings
6. **Error Handling**: Robust error management
7. **Performance**: Optimization tips and benchmarking

### Next Steps

- Explore advanced features like SSML support
- Learn about streaming synthesis for long texts
- Check out integration examples for web applications
- Review the performance guide for production deployments

### Resources

- [VoiRS Documentation](https://docs.voirs.dev/)
- [Python API Reference](../class_reference.md)
- [Performance Guide](../performance_guide.md)
- [Integration Examples](../integration_examples.md)