<div style="background: linear-gradient(to right, #4b6cb7, #182848); padding: 20px; border-radius: 10px; text-align: center; box-shadow: 0 4px 6px rgba(0,0,0,0.1);">
    <h1 style="color: white; margin: 0; font-size: 2.5em; font-weight: 700;">GAICo: Audio Metrics</h1>
    <p style="color: #e0e0e0; margin-top: 10px; font-style: italic; font-size: 1.2em; text-align: center;">Evaluating AI-Generated Audio Content</p>
</div>
<br>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ai4society/GenAIResultsComparator/blob/main/examples/example-audio.ipynb)

This notebook demonstrates how to use **GAICo's** audio metrics for evaluating AI-generated audio outputs. We'll explore two specialized metrics designed for audio comparison:

1. **AudioSNRNormalized**: Evaluates audio quality by calculating the Signal-to-Noise Ratio (SNR) between generated and reference audio
2. **AudioSpectrogramDistance**: Compares spectral characteristics of audio signals using spectrogram-based distance measures

**Use Cases:**
- Evaluating text-to-speech (TTS) systems
- Assessing music generation models
- Comparing audio enhancement algorithms
- Analyzing voice synthesis quality

**What You'll Learn:**
- How to use GAICo's audio metrics with various input formats
- Understanding SNR and spectrogram-based audio evaluation
- Batch processing of audio files
- Integration with GAICo's Experiment class for comparative analysis
- Visualization of audio metric results

## Setup

### Setup for Google Colab

If you are running this notebook in Google Colab, uncomment and run the following cell to install the `gaico` package with audio dependencies.

If you are running locally, you can skip this cell if you have already set up your environment according to the project's README.

In [None]:
# !pip install 'gaico[audio]' -q

### Environment Setup & Imports

The cell below adjusts `sys.path` to find the gaico module if run from examples directory.
This block is primarily for local execution from the `examples/` folder if gaico is not installed.

After installation, you might need to restart the Colab runtime for the changes to take effect.
(Runtime > Restart runtime)

In [None]:
import sys
import os
from pathlib import Path

# Get the project root directory (parent of 'examples')
project_root = str(Path.cwd().parent) if "examples" in str(Path.cwd()) else str(Path.cwd())

# Add project root to the system path if it's not already there
if project_root not in sys.path:
    sys.path.insert(0, project_root)
    print(f"Added project root to sys.path: {project_root}")

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Audio, display
import warnings

# GAICo imports
from gaico import Experiment
from gaico.metrics.audio import AudioSNRNormalized, AudioSpectrogramDistance


# Set up plotting style
sns.set_theme(style="whitegrid")
%matplotlib inline

## Part 1: Understanding Audio Metrics

### 1.1 Signal-to-Noise Ratio (SNR)

The **AudioSNRNormalized** metric calculates how much the generated audio differs from the reference audio in terms of "noise" (unwanted differences). Higher SNR values indicate better quality.

- **Perfect match**: SNR → ∞ (normalized to 1.0)
- **Very noisy**: SNR → negative values (normalized to 0.0)

### 1.2 Spectrogram Distance

The **AudioSpectrogramDistance** metric compares the frequency content of audio signals over time. This is useful for:
- Comparing timbral characteristics
- Evaluating spectral fidelity
- Assessing frequency preservation in audio generation

## Part 2: Basic Usage with Synthetic Audio

Let's start with simple synthetic audio examples to understand how the metrics work.

In [None]:
# Generate synthetic audio for testing
sample_rate = 44100
duration = 1.0  # seconds
t = np.linspace(0, duration, int(sample_rate * duration), False)

# Reference signal: 440 Hz sine wave (A4 note)
reference_audio = np.sin(2 * np.pi * 440 * t).astype(np.float32)

# Test signals with varying quality
test_audio_identical = reference_audio.copy()
test_audio_slight_noise = reference_audio + 0.1 * np.random.normal(0, 1, len(t)).astype(np.float32)
test_audio_different_freq = np.sin(2 * np.pi * 880 * t).astype(np.float32)  # Octave higher
test_audio_very_noisy = reference_audio + 0.5 * np.random.normal(0, 1, len(t)).astype(np.float32)

print("Generated synthetic audio signals:")
print("- Reference: 440 Hz sine wave (A4 note)")
print("- Test 1: Identical to reference")
print("- Test 2: Slightly noisy version")
print("- Test 3: Different frequency (880 Hz)")
print("- Test 4: Very noisy version")

### 2.1 Using AudioSNRNormalized

In [None]:
# Initialize the SNR metric
snr_metric = AudioSNRNormalized(
    snr_min=-20.0,  # Maps to score 0.0
    snr_max=40.0,  # Maps to score 1.0
    sample_rate=sample_rate,
)

# Calculate SNR scores
snr_scores = {
    "Identical audio": snr_metric.calculate(test_audio_identical, reference_audio),
    "Slight noise": snr_metric.calculate(test_audio_slight_noise, reference_audio),
    "Different frequency": snr_metric.calculate(test_audio_different_freq, reference_audio),
    "Very noisy": snr_metric.calculate(test_audio_very_noisy, reference_audio),
}

print("\nAudioSNRNormalized Results:")
print("==========================")
for name, score in snr_scores.items():
    quality = (
        "perfect match"
        if score > 0.95
        else "good quality"
        if score > 0.7
        else "low quality"
        if score > 0.5
        else "poor match - different content"
    )
    print(f"{name:20}: {score:8.4f} ({quality})")

### 2.2 Using AudioSpectrogramDistance

In [None]:
# Initialize spectrogram metrics with different distance types
spec_metric_euclidean = AudioSpectrogramDistance(
    n_fft=2048, hop_length=512, distance_type="euclidean", sample_rate=sample_rate
)

spec_metric_cosine = AudioSpectrogramDistance(
    n_fft=2048, hop_length=512, distance_type="cosine", sample_rate=sample_rate
)

# Calculate spectrogram-based scores
test_audios = [
    ("Identical audio", test_audio_identical),
    ("Slight noise", test_audio_slight_noise),
    ("Different frequency", test_audio_different_freq),
    ("Very noisy", test_audio_very_noisy),
]

print("\nAudioSpectrogramDistance Results (Euclidean):")
print("============================================")
for name, audio in test_audios:
    score = spec_metric_euclidean.calculate(audio, reference_audio)
    print(f"{name:20}: {score:8.4f} ")

print("\nAudioSpectrogramDistance Results (Cosine):")
print("=========================================")
for name, audio in test_audios:
    score = spec_metric_cosine.calculate(audio, reference_audio)
    print(f"{name:20}: {score:8.4f}")

## Part 3: Batch Processing

- GAICo's audio metrics support efficient batch processing. Let's demonstrate this with multiple audio samples.



In [None]:
# Generate a batch of test audio with varying quality
batch_size = 5
noise_levels = [0.05, 0.15, 0.3, 0.5, 1.0]

# Create batch of generated audio
generated_batch = []
model_names = []

for i, noise_level in enumerate(noise_levels):
    if i < 4:
        # Add noise to reference
        audio = reference_audio + noise_level * np.random.normal(0, 1, len(t)).astype(np.float32)
    else:
        # Last one is a completely different signal
        audio = np.sin(2 * np.pi * 523.25 * t).astype(np.float32)  # C5 note

    generated_batch.append(audio)

    if i == 0:
        model_names.append("High Quality")
    elif i == 1:
        model_names.append("Medium Quality")
    elif i == 2:
        model_names.append("Low Quality")
    elif i == 3:
        model_names.append("White Noise")
    else:
        model_names.append("Different")

# Create reference batch (same reference for all)
reference_batch = [reference_audio] * batch_size

# Batch calculate scores
snr_batch_scores = snr_metric.calculate(generated_batch, reference_batch)
spec_batch_scores = spec_metric_euclidean.calculate(generated_batch, reference_batch)

# Create results DataFrame
batch_results = pd.DataFrame(
    {"Model": model_names, "SNR Score": snr_batch_scores, "Spectrogram Score": spec_batch_scores}
)

print("\nBatch Processing Results:")
print("========================")
print(batch_results.to_string(index=True))

### Visualizing Batch Results

In [None]:
# Prepare data for visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot SNR scores
bars1 = ax1.bar(
    range(len(batch_results)), batch_results["SNR Score"], color="skyblue", edgecolor="navy"
)
ax1.set_ylabel("Normalized SNR Score")
ax1.set_title("Audio Quality: SNR Comparison")
ax1.set_ylim(0, 1)
ax1.axhline(y=0.7, color="green", linestyle="--", alpha=0.5, label="Good Quality Threshold")
ax1.set_xticks(range(len(batch_results)))
ax1.set_xticklabels(batch_results["Model"], rotation=45, ha="right")
ax1.legend()

# Add value labels on bars
for i, bar in enumerate(bars1):
    height = bar.get_height()
    ax1.text(
        bar.get_x() + bar.get_width() / 2.0,
        height + 0.01,
        f"{height:.3f}",
        ha="center",
        va="bottom",
        fontsize=9,
    )

# Plot Spectrogram scores
bars2 = ax2.bar(
    range(len(batch_results)),
    batch_results["Spectrogram Score"],
    color="lightcoral",
    edgecolor="darkred",
)
ax2.set_ylabel("Spectrogram Similarity Score")
ax2.set_title("Spectral Similarity Comparison")
ax2.set_ylim(0, 1)
ax2.axhline(y=0.8, color="green", linestyle="--", alpha=0.5, label="High Similarity Threshold")
ax2.set_xticks(range(len(batch_results)))
ax2.set_xticklabels(batch_results["Model"], rotation=45, ha="right")
ax2.legend()

# Add value labels on bars
for i, bar in enumerate(bars2):
    height = bar.get_height()
    ax2.text(
        bar.get_x() + bar.get_width() / 2.0,
        height + 0.01,
        f"{height:.3f}",
        ha="center",
        va="bottom",
        fontsize=9,
    )

plt.tight_layout()
plt.show()

##### When no reference is provided, GAICo automatically uses the first generated audio as the   reference, similar to how text metrics work. This is useful for relative quality assessment and baseline comparisons.   

In [None]:
print("\n🔄 Missing Reference Handling:")
print("=" * 40)

# Case 1: Single audio without reference
print("1. Single audio without reference:")
single_score = snr_metric.calculate(test_audio_slight_noise, reference=None)
print(f"   Score (comparing with itself): {single_score:.3f}")
print("   → Always returns 1.0 for self-comparison")

# Case 2: Batch without reference - uses first as baseline
print("\n2. Batch processing without reference:")
batch_scores_no_ref = snr_metric.calculate(generated_batch, reference=None)
print(f"   Using first audio ('{model_names[0]}') as reference")
for model, score in zip(model_names, batch_scores_no_ref):
    print(f"   {model:15}: {score:.3f}")

print("\n   → First model has score 1.0 (comparing with itself)")
print(f"   → Other models compared against '{model_names[0]}'")

# Case 3: Visualize the comparison
print("\n3. Comparison: With vs Without Reference")
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6), sharey=True)
fig.suptitle("Comparison: With vs. Without Explicit Reference", fontsize=16)

# With explicit reference
with_ref_scores = snr_metric.calculate(generated_batch, reference_batch)
ax1.bar(model_names, with_ref_scores, color="skyblue", edgecolor="navy")
ax1.set_title("With Explicit Reference", fontsize=14)
ax1.set_ylabel("SNR Score")
ax1.tick_params(axis="x", rotation=45)

# Without reference (first as baseline)
bars2 = ax2.bar(model_names, batch_scores_no_ref, color="lightcoral", edgecolor="darkred")
ax2.set_title("Without Reference (First as Baseline)", fontsize=14)
ax2.tick_params(axis="x", rotation=45)

# Highlight the baseline
bars2[0].set_color("gold")
bars2[0].set_edgecolor("orange")
ax2.text(
    0,
    batch_scores_no_ref[0] / 2,
    "Baseline",
    ha="center",
    va="center",
    fontweight="bold",
    color="black",
)

# Add value labels
for i, (bar, score) in enumerate(zip(ax1.patches, with_ref_scores)):
    ax1.text(
        bar.get_x() + bar.get_width() / 2, score + 0.02, f"{score:.2f}", ha="center", va="bottom"
    )
for i, (bar, score) in enumerate(zip(ax2.patches, batch_scores_no_ref)):
    ax2.text(
        bar.get_x() + bar.get_width() / 2, score + 0.02, f"{score:.2f}", ha="center", va="bottom"
    )

ax1.set_ylim(0, 1.1)
ax2.set_ylim(0, 1.1)

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

print("\n💡 Key Insights:")
print("   • Missing reference enables relative quality assessment.")
print("   • The first audio in the batch becomes the quality benchmark.")
print(
    "   • This is useful for ranking multiple generations when a single ground truth is unavailable."
)
print("   • The behavior is consistent with GAICo's text and structured data metrics.")

## Part 4: Real-World Example - Text-to-Speech Evaluation

Let's simulate a scenario where we're evaluating different TTS models. We'll create audio samples with characteristics typical of TTS outputs.

In [None]:
# Simulate speech-like signals using formants
def create_speech_like_signal(duration, sample_rate, formants, amplitudes, add_prosody=True):
    """Create a simplified speech-like signal with formants."""
    t = np.linspace(0, duration, int(sample_rate * duration), False)
    signal = np.zeros_like(t)

    for formant, amp in zip(formants, amplitudes):
        if add_prosody:
            # Add slight frequency modulation for prosody
            freq_mod = 1 + 0.05 * np.sin(2 * np.pi * 3 * t)
            signal += amp * np.sin(2 * np.pi * formant * freq_mod * t)
        else:
            signal += amp * np.sin(2 * np.pi * formant * t)

    # Add envelope for more natural sound
    envelope = np.exp(-t * 0.5) * (1 - np.exp(-t * 20))
    return (signal * envelope).astype(np.float32)


# Reference "speech" - clean signal
formants = [700, 1220, 2600]  # Simplified vowel formants
amplitudes = [1.0, 0.5, 0.3]
reference_speech = create_speech_like_signal(
    2.0, sample_rate, formants, amplitudes, add_prosody=True
)

# TTS Model outputs with varying quality
tts_outputs = {
    "Model A (Premium)": reference_speech
    + 0.02 * np.random.normal(0, 1, len(reference_speech)).astype(np.float32),
    "Model B (Standard)": create_speech_like_signal(
        2.0,
        sample_rate,
        [690, 1200, 2550],  # Slightly shifted formants
        [0.9, 0.4, 0.25],
        add_prosody=True,
    )
    + 0.05 * np.random.normal(0, 1, len(reference_speech)).astype(np.float32),
    "Model C (Basic)": create_speech_like_signal(
        2.0,
        sample_rate,
        [680, 1180, 2500],  # More shifted formants
        [0.8, 0.3, 0.2],
        add_prosody=True,
    )
    + 0.1 * np.random.normal(0, 1, len(reference_speech)).astype(np.float32),
    "Model D (Robotic)": create_speech_like_signal(
        2.0, sample_rate, formants, [0.7, 0.3, 0.1], add_prosody=False
    ),  # No prosody = robotic
}

print("Simulated TTS evaluation scenario:")
print("- Reference: Clean speech-like signal")
print("- Model A: High-quality TTS (minimal artifacts)")
print("- Model B: Standard TTS (some spectral artifacts)")
print("- Model C: Basic TTS (noticeable artifacts)")
print("- Model D: Robotic TTS (monotone, poor prosody)")

In [None]:
# Interactive audio playback for demonstration
print("\n🔊 Listen to the TTS Model Outputs")
print("=" * 50)
print("Click play to hear each model's output:\n")

# Reference audio
print("📌 REFERENCE SPEECH (Ground Truth):")
display(Audio(reference_speech, rate=sample_rate, autoplay=False))

# Model outputs
for model_name, audio in tts_outputs.items():
    print(f"\n🎤 {model_name}:")
    display(Audio(audio, rate=sample_rate, autoplay=False))

    # Show quick stats
    snr_score = snr_metric.calculate(audio, reference_speech)
    print(
        f"   → SNR Score: {snr_score:.3f} | Quality: {'Excellent' if snr_score > 0.9 else 'Good' if snr_score > 0.7 else 'Fair' if snr_score > 0.5 else 'Poor'}"
    )

### 4.1 Using the Experiment Class

GAICo's `Experiment` class provides a streamlined workflow for comparing multiple models.

In [None]:
# Create an Experiment instance
exp = Experiment(
    llm_responses=tts_outputs,  # Using audio data instead of text
    reference_answer=reference_speech,
)

# Ensure output directory exists
output_dir = Path("data/audio")
output_dir.mkdir(parents=True, exist_ok=True)

# Compare using audio metrics
results_df = exp.compare(
    metrics=["AudioSNR", "AudioSpectrogramDistance"],
    plot=True,
    output_csv_path=output_dir / "tts_evaluation.csv",
    custom_thresholds={
        "AudioSNR": 0.7,  # Good quality threshold
        "AudioSpectrogramDistance": 0.8,  # High similarity threshold
    },
    plot_title_suffix="for TTS Evaluation",
)

print("\nTTS Evaluation Results:")
print("======================")
# Displaying with higher precision for clarity
with pd.option_context("display.precision", 4):
    print(results_df.pivot(index="model_name", columns="metric_name", values="score"))

## Part 5: Working with Audio Files

GAICo's audio metrics can also work directly with audio file paths. Let's demonstrate this capability.

In [None]:
# Create temporary audio files for demonstration
import tempfile
import soundfile as sf

# Create a temporary directory
with tempfile.TemporaryDirectory() as temp_dir:
    # Save audio files
    ref_path = os.path.join(temp_dir, "reference.wav")
    gen_good_path = os.path.join(temp_dir, "generated_good.wav")
    gen_poor_path = os.path.join(temp_dir, "generated_poor.wav")

    # Write audio files
    sf.write(ref_path, reference_audio, sample_rate)
    sf.write(gen_good_path, test_audio_slight_noise, sample_rate)
    sf.write(gen_poor_path, test_audio_very_noisy, sample_rate)

    print("Created temporary audio files:")
    print("- reference.wav")
    print("- generated_good.wav")
    print("- generated_poor.wav")

    # Use file paths with metrics
    snr_score_good = snr_metric.calculate(gen_good_path, ref_path)
    spec_score_good = spec_metric_euclidean.calculate(gen_good_path, ref_path)

    snr_score_poor = snr_metric.calculate(gen_poor_path, ref_path)
    spec_score_poor = spec_metric_euclidean.calculate(gen_poor_path, ref_path)

    print("\nFile-based Audio Comparison:")
    print("============================")
    print("Good Quality Audio:")
    print(f"  SNR Score: {snr_score_good:.4f}")
    print(f"  Spectrogram Score: {spec_score_good:.4f}")
    print("\nPoor Quality Audio:")
    print(f"  SNR Score: {snr_score_poor:.4f}")
    print(f"  Spectrogram Score: {spec_score_poor:.4f}")

## Part 6: Advanced Configuration

Both audio metrics offer various configuration options for different use cases.

In [None]:
# Test different SNR configurations
snr_configs = [
    ("Standard range (-20 to 40 dB)", AudioSNRNormalized(snr_min=-20, snr_max=40)),
    ("Strict range (-10 to 20 dB)", AudioSNRNormalized(snr_min=-10, snr_max=20)),
    ("Lenient range (-30 to 60 dB)", AudioSNRNormalized(snr_min=-30, snr_max=60)),
]

# Test different spectrogram configurations
spec_distance_types = [
    ("Euclidean", AudioSpectrogramDistance(distance_type="euclidean")),
    ("Cosine", AudioSpectrogramDistance(distance_type="cosine")),
    ("Correlation", AudioSpectrogramDistance(distance_type="correlation")),
]

spec_fft_sizes = [
    ("1024 samples", AudioSpectrogramDistance(n_fft=1024)),
    ("2048 samples", AudioSpectrogramDistance(n_fft=2048)),
    ("4096 samples", AudioSpectrogramDistance(n_fft=4096)),
]

print("\nConfiguration Comparison:")
print("========================")

print("\nSNR Metric Configurations:")
for config_name, metric in snr_configs:
    score = metric.calculate(test_audio_slight_noise, reference_audio)
    print(f"{config_name:30}: {score:.4f}")

print("\nSpectrogram Configurations:")
print("Distance Type Comparison:")
for config_name, metric in spec_distance_types:
    score = metric.calculate(test_audio_slight_noise, reference_audio)
    print(f"  {config_name:12}: {score:.4f}")

print("\nFFT Size Comparison:")
for config_name, metric in spec_fft_sizes:
    score = metric.calculate(test_audio_slight_noise, reference_audio)
    print(f"  {config_name}: {score:.4f}")

## Part 7: Music Generation Evaluation

Let's demonstrate how these metrics can be used to evaluate music generation models.

In [None]:
# Create a simple musical phrase (C major arpeggio)
def create_musical_phrase(frequencies, duration_per_note, sample_rate, add_harmonics=True):
    """Create a musical phrase from a sequence of frequencies."""
    phrase = []
    for freq in frequencies:
        t = np.linspace(0, duration_per_note, int(sample_rate * duration_per_note), False)
        note = np.sin(2 * np.pi * freq * t)

        if add_harmonics:
            # Add harmonics for richer sound
            note += 0.3 * np.sin(2 * np.pi * freq * 2 * t)  # 2nd harmonic
            note += 0.1 * np.sin(2 * np.pi * freq * 3 * t)  # 3rd harmonic

        # Add envelope
        envelope = np.exp(-t * 2)
        phrase.append(note * envelope)

    return np.concatenate(phrase).astype(np.float32)


# Reference musical phrase (C major arpeggio)
c_major_freqs = [261.63, 329.63, 392.00, 523.25]  # C4, E4, G4, C5
reference_music = create_musical_phrase(c_major_freqs, 0.5, sample_rate, add_harmonics=True)

# Simulated music generation outputs
music_generations = {
    "Perfect Copy": reference_music.copy(),
    "Style Transfer": create_musical_phrase(
        [261.63, 329.63, 392.00, 523.25],  # Same notes
        0.5,
        sample_rate,
        add_harmonics=False,  # Different timbre
    )
    + 0.05 * np.random.normal(0, 1, len(reference_music)).astype(np.float32),
    "Genre Variation": create_musical_phrase(
        [261.63, 311.13, 392.00, 466.16],  # C minor variation
        0.5,
        sample_rate,
        add_harmonics=True,
    ),
    "Amateur Generation": create_musical_phrase(
        [250, 320, 380, 510],  # Slightly off-pitch
        0.45,
        sample_rate,
        add_harmonics=False,  # Also different timing
    )[: len(reference_music)],  # Truncate to match length
}

# Evaluate with multiple metric configurations
music_results = []
for model_name, generated_music in music_generations.items():
    results = {
        "Model": model_name,
        "AudioSNRNormalized": snr_metric.calculate(generated_music, reference_music),
        "AudioSpectrogramDist_euclidean": spec_metric_euclidean.calculate(
            generated_music, reference_music
        ),
        "AudioSpectrogramDist_cosine": spec_metric_cosine.calculate(
            generated_music, reference_music
        ),
    }
    music_results.append(results)

music_df = pd.DataFrame(music_results)
music_df.set_index("Model", inplace=True)

print("\nMusic Generation Evaluation:")
print("===========================")
print(music_df)

# Visualize results
fig, ax = plt.subplots(figsize=(12, 8))

music_df.plot(kind="bar", ax=ax, width=0.8)
ax.set_title("Music Generation Quality Comparison", fontsize=16, pad=20)
ax.set_ylabel("Score (0-1)", fontsize=12)
ax.set_xlabel("Music Generation Models", fontsize=12)
ax.legend(title="Metrics", bbox_to_anchor=(1.05, 1), loc="upper left")


plt.xticks(rotation=45, ha="right")
ax.grid(True, axis="y", alpha=0.3)
ax.set_ylim(0, 1.1)

# Add value labels on bars
for container in ax.containers:
    ax.bar_label(container, fmt="%.3f", rotation=90, fontsize=8, padding=3)

plt.tight_layout()
plt.show()

## Part 8: Error Handling and Edge Cases

GAICo's audio metrics include comprehensive error handling. Let's explore some edge cases.

In [None]:
import tempfile
import soundfile as sf

print("\nError Handling Examples:")
print("=======================")

# 1. Empty audio
try:
    print("\n1. Empty audio array:")
    score = snr_metric.calculate(np.array([]), reference_audio)
except ValueError as e:
    print(f"   Error caught: {e}")

# 2. Mismatched lengths (automatically handled)
print("\n2. Mismatched lengths (handled automatically):")
short_audio = reference_audio[: len(reference_audio) // 2]
score = snr_metric.calculate(reference_audio, short_audio)
print(f"   Score with auto-truncation: {score:.4f}")

# 3. Invalid file path
try:
    print("\n3. Invalid file path:")
    score = snr_metric.calculate("/nonexistent/audio.wav", reference_audio)
except FileNotFoundError as e:
    print(f"   Error caught: {e}")

# 4. Different sample rates (warning issued)
print("\n4. Different sample rates (handling demonstration):")
# Create audio at a different sample rate
sr_22k = 22050
t_22k = np.linspace(0, 1.0, sr_22k, False)
audio_22k = np.sin(2 * np.pi * 440 * t_22k).astype(np.float32)

# Use file paths to test with different sample rates
with (
    tempfile.NamedTemporaryFile(suffix=".wav") as gen_tmp_file,
    tempfile.NamedTemporaryFile(suffix=".wav") as ref_tmp_file,
):
    # Save the generated audio with its actual sample rate (22050 Hz)
    sf.write(gen_tmp_file.name, audio_22k, sr_22k)
    # Save the reference audio with its actual sample rate (44100 Hz)
    sf.write(ref_tmp_file.name, reference_audio, sample_rate)

    # Process files with different sample rates
    try:
        with warnings.catch_warnings(record=True) as w:
            warnings.simplefilter("always")
            score = snr_metric.calculate(gen_tmp_file.name, ref_tmp_file.name)
            print(f"   Score with auto-resampling: {score:.4f}")

            if len(w) > 0 and "Sample rates differ" in str(w[-1].message):
                print(f"   ✅ Warning issued: {w[-1].message}")
            else:
                print("   ✓ Sample rate differences handled silently (no warning)")
    except Exception as e:
        print(f"   ❌ Error: {e}")

# 5. Very short audio
try:
    print("\n5. Very short audio (too short for spectrogram):")
    very_short = np.array([0.1, 0.2, 0.3, 0.4, 0.5], dtype=np.float32)
    score = spec_metric_euclidean.calculate(very_short, reference_audio)
except ValueError as e:
    print(f"   Error caught: {e}")

: 

## Conclusion

In this notebook, we've explored GAICo's audio metrics:

1. **AudioSNRNormalized**: Measures signal quality by comparing noise levels
   - Best for: Overall quality assessment, noise evaluation
   - Configuration: Adjustable SNR range for different applications

2. **AudioSpectrogramDistance**: Compares spectral characteristics
   - Best for: Timbre comparison, frequency content analysis
   - Configuration: Multiple distance types, adjustable FFT parameters

**Key Takeaways:**
- Both metrics support various input formats (numpy arrays, file paths, lists)
- Automatic handling of common issues (sample rate differences, length mismatches)
- Seamless integration with GAICo's Experiment class for multi-model comparison
- Normalized outputs (0-1) for consistency with other GAICo metrics

**When to Use Which Metric:**
- Use **SNR** when you care about overall signal fidelity and noise levels
- Use **Spectrogram Distance** when spectral characteristics and timbre are important
- Consider using both for comprehensive audio quality assessment

For more information on GAICo and its other metrics, visit the [documentation](https://ai4society.github.io/projects/GenAIResultsComparator).