# Visualize Data Augmentation Effects - EDA

This notebook shows you **what each augmentation method does** by:
1. Taking one original sample
2. Applying each augmentation technique
3. Visualizing before/after mel-spectrograms

**Use this to:**
- Understand what augmentation actually does
- Verify augmentation is working correctly
- Tune augmentation parameters
- See if augmentation is too aggressive or too mild

## Setup

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Install dependencies
!pip install -q librosa soundfile matplotlib opencv-python

print("Setup complete!")

In [None]:
# Import modules
import sys
sys.path.append('/content')  # Adjust if your src/ is elsewhere

import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import random
import os

import config
import data_loader
import preprocessing
import augmentation

print("Modules imported successfully!")
config.print_config_summary()

## Load Sample Data

In [None]:
# Load one sample from each species
print("Loading sample data...")

species_data = data_loader.load_species_data()
background_data = data_loader.load_background_data()

# Pick one sample from each species
sample_audios = {}
sample_specs = {}

for species_name, audio_list in species_data.items():
    # Take the first sample (or you can choose randomly)
    sample_audio, filepath = audio_list[0]
    sample_spec = preprocessing.audio_to_melspectrogram(sample_audio)
    
    sample_audios[species_name] = sample_audio
    sample_specs[species_name] = sample_spec
    
    print(f"Loaded {species_name}: {os.path.basename(filepath)}")

# Load background samples for mixing
background_audios = [audio for audio, _ in background_data[:10]]  # Take first 10
background_specs = [preprocessing.audio_to_melspectrogram(audio) for audio in background_audios]

print(f"\nLoaded {len(background_specs)} background samples for mixing")

## Choose a Sample to Augment

In [None]:
# Select which species to visualize
# Change this to see different species
selected_species = list(sample_specs.keys())[0]  # First species

print(f"Selected species: {selected_species}")

original_audio = sample_audios[selected_species]
original_spec = sample_specs[selected_species]

print(f"Audio shape: {original_audio.shape}")
print(f"Spectrogram shape: {original_spec.shape}")

## Visualize Original Sample

In [None]:
def plot_audio_and_spec(audio, spec, title="Sample"):
    """
    Plot waveform and spectrogram side by side
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4))
    
    fig.suptitle(title, fontsize=14, fontweight='bold')
    
    # Waveform
    times = np.arange(len(audio)) / config.SAMPLE_RATE
    ax1.plot(times, audio, linewidth=0.5, color='blue')
    ax1.set_xlabel('Time (s)')
    ax1.set_ylabel('Amplitude')
    ax1.set_title('Waveform')
    ax1.grid(True, alpha=0.3)
    
    # Spectrogram
    img = librosa.display.specshow(
        spec,
        x_axis='time',
        y_axis='mel',
        sr=config.SAMPLE_RATE,
        hop_length=config.HOP_LENGTH,
        fmin=config.FMIN,
        fmax=config.FMAX,
        ax=ax2,
        cmap='viridis'
    )
    ax2.set_title('Mel-Spectrogram')
    fig.colorbar(img, ax=ax2, format='%+2.0f dB')
    
    plt.tight_layout()
    plt.show()

# Plot original
plot_audio_and_spec(original_audio, original_spec, 
                   f"ORIGINAL - {selected_species}")

## Augmentation 1: Background Noise Mixing

In [None]:
print("Augmentation 1: Background Noise Mixing\n")
print("This mixes the primate call with environmental sounds (birds, rain, wind)")
print(f"Current config: {config.AUGMENTATION_CONFIG['background_noise_mix']} versions")
print(f"SNR range: {config.BG_MIX_SNR_RANGE} dB\n")

# Generate 3 versions with different backgrounds and SNRs
snr_values = [-5, 0, 10]  # Low, medium, high SNR

for i, snr in enumerate(snr_values, 1):
    # Pick a random background
    bg_spec = random.choice(background_specs)
    
    # Mix
    mixed_spec = augmentation.add_background_noise(
        original_spec.copy(),
        bg_spec,
        snr_db=snr
    )
    
    # We can't easily convert back to audio, so just show spectrograms
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4))
    
    if snr > 5:
        quality = "Clear (high SNR)"
    elif snr > 0:
        quality = "Medium (medium SNR)"
    else:
        quality = "Noisy (low SNR)"
    
    fig.suptitle(
        f"Background Mix #{i} - SNR = {snr} dB ({quality})",
        fontsize=14, fontweight='bold'
    )
    
    # Original
    img1 = librosa.display.specshow(
        original_spec,
        x_axis='time',
        y_axis='mel',
        sr=config.SAMPLE_RATE,
        hop_length=config.HOP_LENGTH,
        fmin=config.FMIN,
        fmax=config.FMAX,
        ax=ax1,
        cmap='viridis'
    )
    ax1.set_title('Original')
    fig.colorbar(img1, ax=ax1, format='%+2.0f dB')
    
    # Mixed
    img2 = librosa.display.specshow(
        mixed_spec,
        x_axis='time',
        y_axis='mel',
        sr=config.SAMPLE_RATE,
        hop_length=config.HOP_LENGTH,
        fmin=config.FMIN,
        fmax=config.FMAX,
        ax=ax2,
        cmap='viridis'
    )
    ax2.set_title(f'After Background Mix (SNR={snr} dB)')
    fig.colorbar(img2, ax=ax2, format='%+2.0f dB')
    
    plt.tight_layout()
    plt.show()
    
    print(f"SNR = {snr} dB: {'Signal is clear' if snr > 5 else 'Signal mixed with noise' if snr > 0 else 'Signal buried in noise'}\n")

## Augmentation 2: Time Chop (Time-domain cropping)

In [None]:
print("Augmentation 2: Time Chop\n")
print("This crops the call from the beginning or end")
print(f"Chop range: {config.CHOP_RANGE} (10-30% of duration)\n")

# Generate 2 versions: chop from left and right
for i, direction in enumerate(['left', 'right'], 1):
    # Temporarily modify random to force direction
    np.random.seed(i)
    random.seed(i)
    
    # Chop
    chopped_spec = augmentation.time_chop(original_spec.copy(), chop_fraction=0.2)
    
    # Resize back to original shape
    chopped_resized = augmentation.resize_to_original_shape(
        chopped_spec,
        original_spec.shape
    )
    
    # Plot
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 4))
    
    fig.suptitle(
        f"Time Chop #{i} - Removed ~20% from {'beginning' if direction == 'left' else 'end'}",
        fontsize=14, fontweight='bold'
    )
    
    # Original
    librosa.display.specshow(
        original_spec, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax1, cmap='viridis'
    )
    ax1.set_title('Original')
    
    # Chopped (before resize)
    librosa.display.specshow(
        chopped_spec, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax2, cmap='viridis'
    )
    ax2.set_title(f'After Chop (smaller)')
    
    # Resized back
    librosa.display.specshow(
        chopped_resized, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax3, cmap='viridis'
    )
    ax3.set_title('After Resize (stretched back)')
    
    plt.tight_layout()
    plt.show()
    
    print(f"Effect: {'Start of call removed' if direction == 'left' else 'End of call removed'}, then stretched back to original duration\n")

## Augmentation 3: Frequency Chop (Frequency-domain cropping)

In [None]:
print("Augmentation 3: Frequency Chop\n")
print("This removes high or low frequencies")
print(f"Chop range: {config.CHOP_RANGE} (10-30% of frequency range)\n")

# Generate 2 versions: chop from top and bottom
for i, direction in enumerate(['top', 'bottom'], 1):
    np.random.seed(i+10)
    random.seed(i+10)
    
    # Chop
    chopped_spec = augmentation.freq_chop(original_spec.copy(), chop_fraction=0.2)
    
    # Resize back
    chopped_resized = augmentation.resize_to_original_shape(
        chopped_spec,
        original_spec.shape
    )
    
    # Plot
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 4))
    
    fig.suptitle(
        f"Frequency Chop #{i} - Removed ~20% {'high' if direction == 'top' else 'low'} frequencies",
        fontsize=14, fontweight='bold'
    )
    
    # Original
    librosa.display.specshow(
        original_spec, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax1, cmap='viridis'
    )
    ax1.set_title('Original')
    
    # Chopped
    librosa.display.specshow(
        chopped_spec, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax2, cmap='viridis'
    )
    ax2.set_title(f'After Chop (shorter)')
    
    # Resized
    librosa.display.specshow(
        chopped_resized, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax3, cmap='viridis'
    )
    ax3.set_title('After Resize (stretched back)')
    
    plt.tight_layout()
    plt.show()
    
    print(f"Effect: {'High frequencies removed' if direction == 'top' else 'Low frequencies removed'} (simulates frequency filtering)\n")

## Augmentation 4: Frequency Translation (Pitch shift)

In [None]:
print("Augmentation 4: Frequency Translation\n")
print("This shifts the call up or down in frequency (like changing pitch)")
print(f"Shift range: {config.TRANSLATE_RANGE} mel bins\n")

# Generate 3 versions: shift down, no shift, shift up
shift_amounts = [-15, 0, 15]

for i, shift in enumerate(shift_amounts, 1):
    # Translate
    shifted_spec = augmentation.translate(original_spec.copy(), shift_amount=shift)
    
    # Plot
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4))
    
    if shift > 0:
        direction = "UP (higher pitch)"
    elif shift < 0:
        direction = "DOWN (lower pitch)"
    else:
        direction = "(no shift)"
    
    fig.suptitle(
        f"Frequency Translation #{i} - Shift {direction} by {abs(shift)} bins",
        fontsize=14, fontweight='bold'
    )
    
    # Original
    librosa.display.specshow(
        original_spec, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax1, cmap='viridis'
    )
    ax1.set_title('Original')
    
    # Shifted
    librosa.display.specshow(
        shifted_spec, x_axis='time', y_axis='mel',
        sr=config.SAMPLE_RATE, hop_length=config.HOP_LENGTH,
        fmin=config.FMIN, fmax=config.FMAX,
        ax=ax2, cmap='viridis'
    )
    ax2.set_title(f'After Translation (shift={shift})')
    
    plt.tight_layout()
    plt.show()
    
    if shift != 0:
        print(f"Effect: Call {'moved up' if shift > 0 else 'moved down'} in frequency (simulates pitch variation between individuals)\n")
    else:
        print(f"Effect: No change (reference)\n")

## All Augmentations Combined View

In [None]:
print("Generating all 7 augmented versions from config...\n")

# Use the actual augment_spectrogram function
augmented_specs = augmentation.augment_spectrogram(
    original_spec,
    background_specs
)

print(f"Generated {len(augmented_specs)} versions (including original)")
print(f"Expected: {sum(config.AUGMENTATION_CONFIG.values())} versions\n")

# Plot all together
n_cols = 4
n_rows = int(np.ceil(len(augmented_specs) / n_cols))

fig, axes = plt.subplots(n_rows, n_cols, figsize=(20, 5*n_rows))
axes = axes.flatten()

titles = [
    'Original',
    'BG Mix 1',
    'BG Mix 2', 
    'BG Mix 3',
    'Time Chop',
    'Freq Chop',
    'Translate'
]

for i, (spec, title) in enumerate(zip(augmented_specs, titles)):
    librosa.display.specshow(
        spec,
        x_axis='time',
        y_axis='mel',
        sr=config.SAMPLE_RATE,
        hop_length=config.HOP_LENGTH,
        fmin=config.FMIN,
        fmax=config.FMAX,
        ax=axes[i],
        cmap='viridis'
    )
    axes[i].set_title(title, fontsize=12, fontweight='bold')

# Hide empty subplots
for i in range(len(augmented_specs), len(axes)):
    axes[i].axis('off')

fig.suptitle(
    f"All Augmentation Methods Applied to {selected_species}",
    fontsize=16,
    fontweight='bold'
)

plt.tight_layout()
plt.show()

print("\nSummary:")
print(f"1 original sample → {len(augmented_specs)} augmented versions")
print(f"This is a {len(augmented_specs)}x data increase!")

## Compare All Species

In [None]:
print("Comparing augmentation effects across all species...\n")

for species_name in sample_specs.keys():
    print(f"\nSpecies: {species_name}")
    print("="*60)
    
    spec = sample_specs[species_name]
    
    # Generate augmented versions
    aug_specs = augmentation.augment_spectrogram(spec, background_specs)
    
    # Plot original vs one augmented example of each type
    fig, axes = plt.subplots(2, 3, figsize=(18, 8))
    axes = axes.flatten()
    
    examples = [
        (spec, "Original"),
        (aug_specs[1], "Background Mix"),
        (aug_specs[4], "Time Chop"),
        (aug_specs[5], "Freq Chop"),
        (aug_specs[6], "Translate"),
    ]
    
    for i, (s, title) in enumerate(examples):
        librosa.display.specshow(
            s,
            x_axis='time',
            y_axis='mel',
            sr=config.SAMPLE_RATE,
            hop_length=config.HOP_LENGTH,
            fmin=config.FMIN,
            fmax=config.FMAX,
            ax=axes[i],
            cmap='viridis'
        )
        axes[i].set_title(title, fontsize=11, fontweight='bold')
    
    # Hide last subplot
    axes[5].axis('off')
    
    fig.suptitle(
        f"Augmentation Examples - {species_name}",
        fontsize=14,
        fontweight='bold'
    )
    
    plt.tight_layout()
    plt.show()
    
    print(f"Generated {len(aug_specs)} versions for {species_name}")

## Insights & Takeaways

After running this notebook, you should understand:

### 1. Background Noise Mixing
- **High SNR (10 dB)**: Call is clear, background barely visible
- **Low SNR (-5 dB)**: Call is buried in noise
- **Effect**: Teaches model to recognize calls in various noise conditions

### 2. Time Chop
- Removes beginning or end of call
- Then stretches back to original duration
- **Effect**: Teaches model to recognize partial calls

### 3. Frequency Chop
- Removes high or low frequencies
- Simulates different recording equipment
- **Effect**: Teaches model to be robust to frequency filtering

### 4. Frequency Translation
- Shifts entire call up or down in frequency
- Simulates pitch variation between individuals
- **Effect**: Teaches model that pitch can vary

### 5. Overall Impact
- Each sample generates 7 versions
- 182 Cercocebus samples → 1,274 augmented samples
- Model sees much more variety in training

### Tuning Suggestions:

If augmentation seems **too aggressive** (changes sound too much):
```python
# In config.py
BG_MIX_SNR_RANGE = (0, 10)  # Remove very noisy mixes
CHOP_RANGE = (0.05, 0.15)   # Chop less
TRANSLATE_RANGE = (-10, 10)  # Shift less
```

If augmentation seems **too mild** (not enough variation):
```python
# In config.py
AUGMENTATION_CONFIG = {
    'original': 1,
    'background_noise_mix': 5,  # More background mixing
    'time_chop': 2,             # More time variations
    'freq_chop': 2,             # More freq variations
    'translate': 2,             # More pitch variations
}
# This gives 12x augmentation
```