# EMSN 2.0 - Vocalisatie 10-Types Training (Colab Pro+)
## 382 Europese vogelsoorten | Xeno-Canto data | HiDrive checkpoint

### 10 Vocalisatietypes (gebaseerd op Xeno-Canto standaard):
| # | Type | XC query | Ecologische waarde |
|---|------|----------|--------------------|
| 0 | zang | `song` | Territorium, broedgedrag |
| 1 | roep | `call` | Algemene communicatie |
| 2 | alarm | `alarm call` | Predator aanwezigheid |
| 3 | vluchtroep | `flight call` | Verplaatsingsgedrag |
| 4 | subzang | `subsong` | Leergedrag, jonge vogels |
| 5 | bedelroep | `begging call` | Broedsucces, uitgevlogen jongen |
| 6 | contactroep | `social call` | Groepscohesie, paarband |
| 7 | nachttrekroep | `nocturnal flight call` | Nachtelijke trek monitoring |
| 8 | baltszang | `flight song` | Baltsgedrag (Boompieper!) |
| 9 | roffel | `drumming` | Spechten territorium |

### Soortenlijst (382 soorten):
- **Nederlandse broedvogels & vaste gasten** (~210 soorten)
- **Scandinavische / boreale soorten** (~38 soorten)
- **Zuid-Europese / mediterrane soorten** (~47 soorten)
- **Oost-Europese soorten** (~28 soorten)
- **Trekvogels / doortrekkers / nocmig** (~29 soorten)
- **Pelagische / Noordzee / kust soorten** (~33 soorten)

### Training pipeline:
1. Download audio van Xeno-Canto API v3 (per soort, per type)
2. Genereer mel-spectrogrammen (128x128)
3. Agressieve data augmentatie: pitch shift, time stretch, pink noise, SpecAugment
4. Ultimate CNN (4 conv blokken) + mixed precision training
5. Auto-checkpoint naar HiDrive bij crash/timeout
6. Upload finale modellen naar HiDrive

### Colab Pro+ instellingen:
1. Runtime → Change runtime type
2. Hardware accelerator: **GPU** → **A100** (of H100)
3. High-RAM: **Aan**

### Voorbereiding (eenmalig):
1. Stel twee Colab Secrets in (sleutel-icoon links):
   - `HIDRIVE_SSH_KEY` → inhoud van `~/.ssh/id_ed25519_hidrive`
   - `XC_API_KEY` → je Xeno-Canto API key
2. Upload eventuele eigen review-data naar HiDrive:
```bash
rclone sync /mnt/nas-birdnet-archive/vocalization/retraining/ \
  hidrive:users/ronnyclouddisk/emsn-backups/vocalization-retraining/ --progress
```

---
## Ctrl+F9 om alles te draaien

In [None]:
# @title 1. GPU Detection & Dependencies
import subprocess
import sys
import os

# Install dependencies
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q",
                       "librosa", "scikit-learn", "scikit-image", "matplotlib",
                       "tqdm", "requests", "paramiko", "audiomentations",
                       "torch-audiomentations", "seaborn"])

import torch
import gc

print("=" * 60)
print("GPU DETECTION")
print("=" * 60)
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")

if not torch.cuda.is_available():
    print("\nGEEN GPU! Ga naar Runtime > Change runtime type > GPU")
    raise SystemExit("GPU required")

GPU_NAME = torch.cuda.get_device_name(0)
_props = torch.cuda.get_device_properties(0)
GPU_MEM = getattr(_props, "total_memory", getattr(_props, "total_mem", 0)) / 1e9

print(f"\nGPU: {GPU_NAME}")
print(f"Memory: {GPU_MEM:.1f} GB")

import psutil
ram_gb = psutil.virtual_memory().total / 1e9
print(f"RAM: {ram_gb:.1f} GB")

# Auto-configure based on GPU
if "H100" in GPU_NAME:
    BATCH_SIZE = 128
    NUM_WORKERS = 8
    USE_BF16 = True
    GPU_TIER = "H100"
elif "A100" in GPU_NAME:
    BATCH_SIZE = 96
    NUM_WORKERS = 6
    USE_BF16 = True
    GPU_TIER = "A100"
elif "V100" in GPU_NAME or "L4" in GPU_NAME:
    BATCH_SIZE = 64
    NUM_WORKERS = 4
    USE_BF16 = False
    GPU_TIER = "V100/L4"
else:  # T4, etc.
    BATCH_SIZE = 48
    NUM_WORKERS = 4
    USE_BF16 = False
    GPU_TIER = "T4/Other"

# Stability + performance
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.set_float32_matmul_precision('high')

device = torch.device("cuda")
dtype = torch.bfloat16 if USE_BF16 else torch.float16

print(f"\nConfig voor {GPU_TIER}:")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Precision: {dtype}")
print(f"  Workers: {NUM_WORKERS}")

In [None]:
# @title 2. HiDrive Verbinding via SSH Key (Colab Secrets)
import paramiko
import tempfile
from pathlib import Path
from google.colab import userdata

HIDRIVE_HOST = "sftp.hidrive.strato.com"
HIDRIVE_USER = "ronnyclouddisk"

# Lees SSH key uit Colab Secrets
try:
    ssh_key_content = userdata.get("HIDRIVE_SSH_KEY")
    print("SSH key geladen uit Colab Secrets")
except userdata.SecretNotFoundError:
    raise RuntimeError(
        "Secret 'HIDRIVE_SSH_KEY' niet gevonden!\n\n"
        "Stel in via het sleutel-icoon links in Colab:\n"
        "  Naam: HIDRIVE_SSH_KEY\n"
        "  Waarde: inhoud van ~/.ssh/id_ed25519_hidrive\n"
        "  Notebook toegang: Aan"
    )

# Schrijf key naar tijdelijk bestand
_key_file = tempfile.NamedTemporaryFile(mode="w", suffix="_hidrive", delete=False)
_key_file.write(ssh_key_content)
if not ssh_key_content.endswith("\n"):
    _key_file.write("\n")
_key_file.close()
os.chmod(_key_file.name, 0o600)

# Detecteer key type
try:
    _pkey = paramiko.Ed25519Key.from_private_key_file(_key_file.name)
    print("  Key type: Ed25519")
except Exception:
    try:
        _pkey = paramiko.RSAKey.from_private_key_file(_key_file.name)
        print("  Key type: RSA")
    except Exception:
        _pkey = paramiko.ECDSAKey.from_private_key_file(_key_file.name)
        print("  Key type: ECDSA")

# HiDrive paden
HIDRIVE_MODELS_DIR = "/users/ronnyclouddisk/emsn-backups/vocalization-models-10types/"
HIDRIVE_RETRAINING_DIR = "/users/ronnyclouddisk/emsn-backups/vocalization-retraining/"
HIDRIVE_CHECKPOINT_DIR = "/users/ronnyclouddisk/emsn-backups/vocalization-checkpoints/"

# Lokale paden
LOCAL_BASE = Path("/content/EMSN-Vocalization-10Types")
LOCAL_AUDIO = LOCAL_BASE / "audio"
LOCAL_MODELS = LOCAL_BASE / "models"
LOCAL_CHECKPOINT = LOCAL_BASE / "checkpoints"
LOCAL_RETRAINING = LOCAL_BASE / "retraining"
for d in [LOCAL_AUDIO, LOCAL_MODELS, LOCAL_CHECKPOINT, LOCAL_RETRAINING]:
    d.mkdir(parents=True, exist_ok=True)


def get_sftp():
    """Maak SFTP verbinding naar HiDrive."""
    transport = paramiko.Transport((HIDRIVE_HOST, 22))
    transport.connect(username=HIDRIVE_USER, pkey=_pkey)
    return paramiko.SFTPClient.from_transport(transport), transport


def sftp_mkdir_p(sftp, remote_path):
    """mkdir -p voor SFTP."""
    parts = remote_path.strip("/").split("/")
    current = ""
    for part in parts:
        current += f"/{part}"
        try:
            sftp.stat(current)
        except FileNotFoundError:
            sftp.mkdir(current)


def upload_to_hidrive(local_path, remote_dir, remote_name=None):
    """Upload bestand naar HiDrive."""
    sftp, transport = get_sftp()
    try:
        sftp_mkdir_p(sftp, remote_dir)
        if remote_name is None:
            remote_name = Path(local_path).name
        remote_path = f"{remote_dir}{remote_name}"
        sftp.put(str(local_path), remote_path)
        size_mb = os.path.getsize(local_path) / (1024 * 1024)
        print(f"  Uploaded: {remote_name} ({size_mb:.1f} MB)")
    finally:
        sftp.close()
        transport.close()


def download_from_hidrive(remote_path, local_path):
    """Download bestand van HiDrive."""
    sftp, transport = get_sftp()
    try:
        sftp.get(remote_path, str(local_path))
        return True
    except FileNotFoundError:
        return False
    finally:
        sftp.close()
        transport.close()


def download_retraining_data():
    """Download eigen review-correcties van HiDrive (als beschikbaar)."""
    sftp, transport = get_sftp()
    total = 0
    try:
        try:
            sftp.stat(HIDRIVE_RETRAINING_DIR)
        except FileNotFoundError:
            print("  Geen retraining data op HiDrive (dat is OK, we gebruiken Xeno-Canto)")
            return 0

        for species_dir in sftp.listdir(HIDRIVE_RETRAINING_DIR):
            remote_dir = f"{HIDRIVE_RETRAINING_DIR}{species_dir}"
            try:
                import stat as stat_mod
                if not stat_mod.S_ISDIR(sftp.stat(remote_dir).st_mode):
                    continue
            except Exception:
                continue

            local_dir = LOCAL_RETRAINING / species_dir
            local_dir.mkdir(parents=True, exist_ok=True)

            files = [f for f in sftp.listdir(remote_dir) if f.endswith(('.mp3', '.wav'))]
            for fname in files:
                local_file = local_dir / fname
                if not local_file.exists():
                    sftp.get(f"{remote_dir}/{fname}", str(local_file))
                    total += 1

        print(f"  {total} retraining bestanden gedownload")
    finally:
        sftp.close()
        transport.close()
    return total


# Test verbinding
print("\nTest HiDrive verbinding...")
sftp, transport = get_sftp()
print(f"  Verbonden met {HIDRIVE_HOST}")
sftp.close()
transport.close()

# Download eigen review-data (indien beschikbaar)
print("\nRetraining data ophalen...")
download_retraining_data()

In [None]:
# @title 3. Configuratie
import time
import numpy as np
from datetime import datetime
from google.colab import userdata

VERSION = '2026_10types'

# === XENO-CANTO API KEY (via Colab Secret) ===
try:
    XC_API_KEY = userdata.get("XC_API_KEY")
    print(f"Xeno-Canto API key geladen uit Colab Secrets")
except userdata.SecretNotFoundError:
    raise RuntimeError(
        "Secret 'XC_API_KEY' niet gevonden!\n\n"
        "Stel in via het sleutel-icoon links in Colab:\n"
        "  Naam: XC_API_KEY\n"
        "  Waarde: je Xeno-Canto API key\n"
        "  Notebook toegang: Aan\n\n"
        "Nog geen key? Registreer op https://xeno-canto.org/explore/api"
    )

# === VOCALISATIE TYPES ===
# Mapping: EMSN type -> Xeno-Canto query -> label index
VOCALIZATION_TYPES = {
    'zang':          {'xc_query': 'song',                  'label': 0},
    'roep':          {'xc_query': 'call',                  'label': 1},
    'alarm':         {'xc_query': 'alarm call',            'label': 2},
    'vluchtroep':    {'xc_query': 'flight call',           'label': 3},
    'subzang':       {'xc_query': 'subsong',               'label': 4},
    'bedelroep':     {'xc_query': 'begging call',          'label': 5},
    'contactroep':   {'xc_query': 'social call',           'label': 6},
    'nachttrekroep': {'xc_query': 'nocturnal flight call', 'label': 7},
    'baltszang':     {'xc_query': 'flight song',           'label': 8},
    'roffel':        {'xc_query': 'drumming',              'label': 9},
}

# Training parameters - Pro+ geoptimaliseerd
EPOCHS = 80
LEARNING_RATE = 0.001
MIN_LR = 0.000005
PATIENCE = 15
WARMUP_EPOCHS = 3
LABEL_SMOOTHING = 0.05
WEIGHT_DECAY = 0.01

# Data parameters
MAX_RECORDINGS_PER_TYPE = 80     # Meer data = beter
MAX_SEGMENTS_PER_RECORDING = 5   # Meerdere 3s segmenten per opname
MIN_SAMPLES_PER_TYPE = 15        # Minimum om type mee te nemen
MIN_TYPES_FOR_TRAINING = 2       # Minimaal 2 types nodig
MAX_CONCURRENT_DOWNLOADS = 12

# Audio parameters
SAMPLE_RATE = 48000
N_MELS = 128
N_FFT = 2048
HOP_LENGTH = 512
FMIN = 500
FMAX = 12000   # Hoger dan ultimate (8000) voor vluchtroepen
SEGMENT_DURATION = 3.0

# Augmentatie intensiteit
AUG_PITCH_RANGE = 3        # +/- semitones
AUG_STRETCH_RANGE = 0.15   # +/- 15% tempo
AUG_NOISE_LEVELS = [0.002, 0.005, 0.01]  # Gaussian noise
AUG_PINK_NOISE_LEVEL = 0.008
AUG_SPEC_FREQ_MASK = 15    # SpecAugment freq mask breedte
AUG_SPEC_TIME_MASK = 20    # SpecAugment time mask breedte

print(f"EMSN Vocalisatie 10-Types Training")
print(f"{'='*60}")
print(f"GPU: {GPU_TIER} | Batch: {BATCH_SIZE} | Precision: {dtype}")
print(f"Epochs: {EPOCHS} | Patience: {PATIENCE} | LR: {LEARNING_RATE}")
print(f"Recordings per type: {MAX_RECORDINGS_PER_TYPE}")
print(f"Segments per recording: {MAX_SEGMENTS_PER_RECORDING}")
print(f"Freq range: {FMIN}-{FMAX} Hz")
print(f"\nTypes ({len(VOCALIZATION_TYPES)}):")
for name, info in VOCALIZATION_TYPES.items():
    print(f"  [{info['label']}] {name} -> XC: {info['xc_query']}")

In [None]:
# @title 4. Soortenlijst (400+ Europese vogelsoorten)
# Nederlandse broedvogels, trekvogels, Scandinavische, Zuid- en Oost-Europese soorten.
# Elke tuple: (Nederlandse naam, wetenschappelijke naam, directory naam)

ALL_SPECIES = [
    # ===== NEDERLANDSE BROEDVOGELS & VASTE GASTEN =====
    ("Aalscholver", "Phalacrocorax carbo", "aalscholver"),
    ("Appelvink", "Coccothraustes coccothraustes", "appelvink"),
    ("Baardman", "Panurus biarmicus", "baardman"),
    ("Barmsijs", "Acanthis flammea", "barmsijs"),
    ("Beflijster", "Turdus torquatus", "beflijster"),
    ("Bergeend", "Tadorna tadorna", "bergeend"),
    ("Bijeneter", "Merops apiaster", "bijeneter"),
    ("Blauwborst", "Luscinia svecica", "blauwborst"),
    ("Blauwe Kiekendief", "Circus cyaneus", "blauwe_kiekendief"),
    ("Blauwe Reiger", "Ardea cinerea", "blauwe_reiger"),
    ("Boerenzwaluw", "Hirundo rustica", "boerenzwaluw"),
    ("Bokje", "Lymnocryptes minimus", "bokje"),
    ("Bontbekplevier", "Charadrius hiaticula", "bontbekplevier"),
    ("Bonte Kraai", "Corvus cornix", "bonte_kraai"),
    ("Bonte Strandloper", "Calidris alpina", "bonte_strandloper"),
    ("Bonte Vliegenvanger", "Ficedula hypoleuca", "bonte_vliegenvanger"),
    ("Boomklever", "Sitta europaea", "boomklever"),
    ("Boomkruiper", "Certhia brachydactyla", "boomkruiper"),
    ("Boomleeuwerik", "Lullula arborea", "boomleeuwerik"),
    ("Boompieper", "Anthus trivialis", "boompieper"),
    ("Boomvalk", "Falco subbuteo", "boomvalk"),
    ("Bosrietzanger", "Acrocephalus palustris", "bosrietzanger"),
    ("Bosruiter", "Tringa glareola", "bosruiter"),
    ("Bosuil", "Strix aluco", "bosuil"),
    ("Braamsluiper", "Curruca curruca", "braamsluiper"),
    ("Brandgans", "Branta leucopsis", "brandgans"),
    ("Brilduiker", "Bucephala clangula", "brilduiker"),
    ("Bruine Kiekendief", "Circus aeruginosus", "bruine_kiekendief"),
    ("Buidelmees", "Remiz pendulinus", "buidelmees"),
    ("Buizerd", "Buteo buteo", "buizerd"),
    ("Canadese Gans", "Branta canadensis", "canadese_gans"),
    ("Cetti's Zanger", "Cettia cetti", "cettis_zanger"),
    ("Citroenkanarie", "Serinus citrinella", "citroenkanarie"),
    ("Dodaars", "Tachybaptus ruficollis", "dodaars"),
    ("Draaihals", "Jynx torquilla", "draaihals"),
    ("Drieteenstrandloper", "Calidris alba", "drieteenstrandloper"),
    ("Dwergstern", "Sternula albifrons", "dwergstern"),
    ("Eider", "Somateria mollissima", "eider"),
    ("Ekster", "Pica pica", "ekster"),
    ("Europese Kanarie", "Serinus serinus", "europese_kanarie"),
    ("Fazant", "Phasianus colchicus", "fazant"),
    ("Fitis", "Phylloscopus trochilus", "fitis"),
    ("Flamingo", "Phoenicopterus roseus", "flamingo"),
    ("Fluiter", "Phylloscopus sibilatrix", "fluiter"),
    ("Fuut", "Podiceps cristatus", "fuut"),
    ("Gaai", "Garrulus glandarius", "gaai"),
    ("Geelgors", "Emberiza citrinella", "geelgors"),
    ("Geelpootmeeuw", "Larus michahellis", "geelpootmeeuw"),
    ("Gekraagde Roodstaart", "Phoenicurus phoenicurus", "gekraagde_roodstaart"),
    ("Gele Kwikstaart", "Motacilla flava", "gele_kwikstaart"),
    ("Gierzwaluw", "Apus apus", "gierzwaluw"),
    ("Glanskop", "Poecile palustris", "glanskop"),
    ("Goudhaan", "Regulus regulus", "goudhaan"),
    ("Goudplevier", "Pluvialis apricaria", "goudplevier"),
    ("Goudvink", "Pyrrhula pyrrhula", "goudvink"),
    ("Grasmus", "Curruca communis", "grasmus"),
    ("Graspieper", "Anthus pratensis", "graspieper"),
    ("Graszanger", "Cisticola juncidis", "graszanger"),
    ("Grauwe Gans", "Anser anser", "grauwe_gans"),
    ("Grauwe Gors", "Emberiza calandra", "grauwe_gors"),
    ("Grauwe Kiekendief", "Circus pygargus", "grauwe_kiekendief"),
    ("Grauwe Klauwier", "Lanius collurio", "grauwe_klauwier"),
    ("Grauwe Vliegenvanger", "Muscicapa striata", "grauwe_vliegenvanger"),
    ("Groene Specht", "Picus viridis", "groene_specht"),
    ("Groenling", "Chloris chloris", "groenling"),
    ("Groenpootruiter", "Tringa nebularia", "groenpootruiter"),
    ("Grote Barmsijs", "Acanthis flammea", "grote_barmsijs"),
    ("Grote Bonte Specht", "Dendrocopos major", "grote_bonte_specht"),
    ("Grote Canadese Gans", "Branta canadensis", "grote_canadese_gans"),
    ("Grote Gele Kwikstaart", "Motacilla cinerea", "grote_gele_kwikstaart"),
    ("Grote Karekiet", "Acrocephalus arundinaceus", "grote_karekiet"),
    ("Grote Lijster", "Turdus viscivorus", "grote_lijster"),
    ("Grote Mantelmeeuw", "Larus marinus", "grote_mantelmeeuw"),
    ("Grote Zaagbek", "Mergus merganser", "grote_zaagbek"),
    ("Grote Zilverreiger", "Ardea alba", "grote_zilverreiger"),
    ("Grutto", "Limosa limosa", "grutto"),
    ("Haakbek", "Pinicola enucleator", "haakbek"),
    ("Halsbandparkiet", "Psittacula krameri", "halsbandparkiet"),
    ("Havik", "Accipiter gentilis", "havik"),
    ("Heggenmus", "Prunella modularis", "heggenmus"),
    ("Holenduif", "Columba oenas", "holenduif"),
    ("Hop", "Upupa epops", "hop"),
    ("Houtduif", "Columba palumbus", "houtduif"),
    ("Houtsnip", "Scolopax rusticola", "houtsnip"),
    ("Huismus", "Passer domesticus", "huismus"),
    ("Huiszwaluw", "Delichon urbicum", "huiszwaluw"),
    ("IJsvogel", "Alcedo atthis", "ijsvogel"),
    ("Kanoetstrandloper", "Calidris canutus", "kanoetstrandloper"),
    ("Kauw", "Coloeus monedula", "kauw"),
    ("Keep", "Fringilla montifringilla", "keep"),
    ("Kerkuil", "Tyto alba", "kerkuil"),
    ("Kievit", "Vanellus vanellus", "kievit"),
    ("Klapekster", "Lanius excubitor", "klapekster"),
    ("Kleine Bonte Specht", "Dryobates minor", "kleine_bonte_specht"),
    ("Kleine Karekiet", "Acrocephalus scirpaceus", "kleine_karekiet"),
    ("Kleine Mantelmeeuw", "Larus fuscus", "kleine_mantelmeeuw"),
    ("Kleine Rietgans", "Anser brachyrhynchus", "kleine_rietgans"),
    ("Kleine Strandloper", "Calidris minuta", "kleine_strandloper"),
    ("Kleine Zilverreiger", "Egretta garzetta", "kleine_zilverreiger"),
    ("Kleine Zwaan", "Cygnus columbianus", "kleine_zwaan"),
    ("Kluut", "Recurvirostra avosetta", "kluut"),
    ("Kneu", "Linaria cannabina", "kneu"),
    ("Knobbelzwaan", "Cygnus olor", "knobbelzwaan"),
    ("Koekoek", "Cuculus canorus", "koekoek"),
    ("Kokmeeuw", "Chroicocephalus ridibundus", "kokmeeuw"),
    ("Kolgans", "Anser albifrons", "kolgans"),
    ("Koolmees", "Parus major", "koolmees"),
    ("Koperwiek", "Turdus iliacus", "koperwiek"),
    ("Kraanvogel", "Grus grus", "kraanvogel"),
    ("Krakeend", "Mareca strepera", "krakeend"),
    ("Kramsvogel", "Turdus pilaris", "kramsvogel"),
    ("Kruisbek", "Loxia curvirostra", "kruisbek"),
    ("Kuifeend", "Aythya fuligula", "kuifeend"),
    ("Kuifmees", "Lophophanes cristatus", "kuifmees"),
    ("Kwak", "Nycticorax nycticorax", "kwak"),
    ("Kwartel", "Coturnix coturnix", "kwartel"),
    ("Kwartelkoning", "Crex crex", "kwartelkoning"),
    ("Mandarijneend", "Aix galericulata", "mandarijneend"),
    ("Matkop", "Poecile montanus", "matkop"),
    ("Meerkoet", "Fulica atra", "meerkoet"),
    ("Merel", "Turdus merula", "merel"),
    ("Middelste Bonte Specht", "Dendrocoptes medius", "middelste_bonte_specht"),
    ("Middelste Zaagbek", "Mergus serrator", "middelste_zaagbek"),
    ("Nachtegaal", "Luscinia megarhynchos", "nachtegaal"),
    ("Nachtzwaluw", "Caprimulgus europaeus", "nachtzwaluw"),
    ("Nijlgans", "Alopochen aegyptiaca", "nijlgans"),
    ("Nonnetje", "Mergellus albellus", "nonnetje"),
    ("Oehoe", "Bubo bubo", "oehoe"),
    ("Oeverloper", "Actitis hypoleucos", "oeverloper"),
    ("Oeverzwaluw", "Riparia riparia", "oeverzwaluw"),
    ("Ooievaar", "Ciconia ciconia", "ooievaar"),
    ("Orpheusspotvogel", "Hippolais polyglotta", "orpheusspotvogel"),
    ("Paapje", "Saxicola rubetra", "paapje"),
    ("Patrijs", "Perdix perdix", "patrijs"),
    ("Pestvogel", "Bombycilla garrulus", "pestvogel"),
    ("Pijlstaart", "Anas acuta", "pijlstaart"),
    ("Pimpelmees", "Cyanistes caeruleus", "pimpelmees"),
    ("Porseleinhoen", "Porzana porzana", "porseleinhoen"),
    ("Putter", "Carduelis carduelis", "putter"),
    ("Raaf", "Corvus corax", "raaf"),
    ("Ransuil", "Asio otus", "ransuil"),
    ("Regenwulp", "Numenius phaeopus", "regenwulp"),
    ("Rietgors", "Emberiza schoeniclus", "rietgors"),
    ("Rietzanger", "Acrocephalus schoenobaenus", "rietzanger"),
    ("Ringmus", "Passer montanus", "ringmus"),
    ("Rode Wouw", "Milvus milvus", "rode_wouw"),
    ("Roek", "Corvus frugilegus", "roek"),
    ("Roerdomp", "Botaurus stellaris", "roerdomp"),
    ("Roodborst", "Erithacus rubecula", "roodborst"),
    ("Roodborsttapuit", "Saxicola rubicola", "roodborsttapuit"),
    ("Roodhalsfuut", "Podiceps grisegena", "roodhalsfuut"),
    ("Roodkeelduiker", "Gavia stellata", "roodkeelduiker"),
    ("Roodkeelpieper", "Anthus cervinus", "roodkeelpieper"),
    ("Rosse Grutto", "Limosa lapponica", "rosse_grutto"),
    ("Rotsduif", "Columba livia", "rotsduif"),
    ("Scharrelaar", "Coracias garrulus", "scharrelaar"),
    ("Scholekster", "Haematopus ostralegus", "scholekster"),
    ("Sijs", "Spinus spinus", "sijs"),
    ("Slechtvalk", "Falco peregrinus", "slechtvalk"),
    ("Slobeend", "Spatula clypeata", "slobeend"),
    ("Smelleken", "Falco columbarius", "smelleken"),
    ("Smient", "Mareca penelope", "smient"),
    ("Sneeuwgors", "Plectrophenax nivalis", "sneeuwgors"),
    ("Snor", "Locustella luscinioides", "snor"),
    ("Sperwer", "Accipiter nisus", "sperwer"),
    ("Spotvogel", "Hippolais icterina", "spotvogel"),
    ("Spreeuw", "Sturnus vulgaris", "spreeuw"),
    ("Sprinkhaanzanger", "Locustella naevia", "sprinkhaanzanger"),
    ("Staartmees", "Aegithalos caudatus", "staartmees"),
    ("Stadsduif", "Columba livia domestica", "stadsduif"),
    ("Steenloper", "Arenaria interpres", "steenloper"),
    ("Steenuil", "Athene noctua", "steenuil"),
    ("Stormmeeuw", "Larus canus", "stormmeeuw"),
    ("Tafeleend", "Aythya ferina", "tafeleend"),
    ("Taigaboomkruiper", "Certhia familiaris", "taigaboomkruiper"),
    ("Tapuit", "Oenanthe oenanthe", "tapuit"),
    ("Tjiftjaf", "Phylloscopus collybita", "tjiftjaf"),
    ("Toendrarietgans", "Anser serrirostris", "toendrarietgans"),
    ("Torenvalk", "Falco tinnunculus", "torenvalk"),
    ("Tuinfluiter", "Sylvia borin", "tuinfluiter"),
    ("Tureluur", "Tringa totanus", "tureluur"),
    ("Turkse Tortel", "Streptopelia decaocto", "turkse_tortel"),
    ("Veldleeuwerik", "Alauda arvensis", "veldleeuwerik"),
    ("Velduil", "Asio flammeus", "velduil"),
    ("Vink", "Fringilla coelebs", "vink"),
    ("Visdief", "Sterna hirundo", "visdief"),
    ("Vuurgoudhaan", "Regulus ignicapilla", "vuurgoudhaan"),
    ("Waterhoen", "Gallinula chloropus", "waterhoen"),
    ("Waterral", "Rallus aquaticus", "waterral"),
    ("Watersnip", "Gallinago gallinago", "watersnip"),
    ("Wielewaal", "Oriolus oriolus", "wielewaal"),
    ("Wilde Eend", "Anas platyrhynchos", "wilde_eend"),
    ("Wilde Zwaan", "Cygnus cygnus", "wilde_zwaan"),
    ("Winterkoning", "Troglodytes troglodytes", "winterkoning"),
    ("Wintertaling", "Anas crecca", "wintertaling"),
    ("Witgat", "Tringa ochropus", "witgat"),
    ("Witte Kwikstaart", "Motacilla alba", "witte_kwikstaart"),
    ("Wulp", "Numenius arquata", "wulp"),
    ("Zanglijster", "Turdus philomelos", "zanglijster"),
    ("Zilvermeeuw", "Larus argentatus", "zilvermeeuw"),
    ("Zomertortel", "Streptopelia turtur", "zomertortel"),
    ("Zwarte Kraai", "Corvus corone", "zwarte_kraai"),
    ("Zwarte Mees", "Periparus ater", "zwarte_mees"),
    ("Zwarte Roodstaart", "Phoenicurus ochruros", "zwarte_roodstaart"),
    ("Zwarte Ruiter", "Tringa erythropus", "zwarte_ruiter"),
    ("Zwarte Specht", "Dryocopus martius", "zwarte_specht"),
    ("Zwartkop", "Sylvia atricapilla", "zwartkop"),

    # ===== SCANDINAVISCHE / BOREALE SOORTEN =====
    ("Auerhoen", "Tetrao urogallus", "auerhoen"),
    ("Korhoen", "Lyrurus tetrix", "korhoen"),
    ("Hazelhoen", "Tetrastes bonasia", "hazelhoen"),
    ("Siberische Gaai", "Perisoreus infaustus", "siberische_gaai"),
    ("Notenkraker", "Nucifraga caryocatactes", "notenkraker"),
    ("Dwerguil", "Glaucidium passerinum", "dwerguil"),
    ("Oeraluil", "Strix uralensis", "oeraluil"),
    ("Ruigpootuil", "Aegolius funereus", "ruigpootuil"),
    ("Sperweruil", "Surnia ulula", "sperweruil"),
    ("Sneeuwuil", "Bubo scandiacus", "sneeuwuil"),
    ("Witrugspecht", "Dendrocopos leucotos", "witrugspecht"),
    ("Drieteenspecht", "Picoides tridactylus", "drieteenspecht"),
    ("Grijskopspecht", "Picus canus", "grijskopspecht"),
    ("Dennenkruisbek", "Loxia pytyopsittacus", "dennenkruisbek"),
    ("Witbandkruisbek", "Loxia leucoptera", "witbandkruisbek"),
    ("Noordse Nachtegaal", "Luscinia luscinia", "noordse_nachtegaal"),
    ("Blauwstaart", "Tarsiger cyanurus", "blauwstaart"),
    ("Arctische Warbler", "Phylloscopus borealis", "arctische_warbler"),
    ("Bladkoninkje", "Phylloscopus proregulus", "bladkoninkje"),
    ("Dwerggors", "Emberiza pusilla", "dwerggors"),
    ("Bosgors", "Emberiza rustica", "bosgors"),
    ("Dennengors", "Emberiza leucocephalos", "dennengors"),
    ("IJsgors", "Calcarius lapponicus", "ijsgors"),
    ("Siberische Heggenmus", "Prunella montanella", "siberische_heggenmus"),
    ("Azuurmees", "Cyanistes cyanus", "azuurmees"),
    ("Giervalk", "Falco rusticolus", "giervalk"),
    ("Ruigpootbuizerd", "Buteo lagopus", "ruigpootbuizerd"),
    ("Zeearend", "Haliaeetus albicilla", "zeearend"),
    ("Steppekiekendief", "Circus macrourus", "steppekiekendief"),
    ("Kleine Vliegenvanger", "Ficedula parva", "kleine_vliegenvanger"),
    ("Withalsvliegenvanger", "Ficedula albicollis", "withalsvliegenvanger"),
    ("Geelbrauwgors", "Emberiza chrysophrys", "geelbrauwgors"),
    ("Siberische Pieper", "Anthus japonicus", "siberische_pieper"),
    ("Siberische Boompieper", "Anthus hodgsoni", "siberische_boompieper"),
    ("Pechorapieper", "Anthus gustavi", "pechorapieper"),
    ("Siberische Lijster", "Geokichla sibirica", "siberische_lijster"),
    ("Bruine Boszanger", "Phylloscopus fuscatus", "bruine_boszanger"),
    ("Raddes Boszanger", "Phylloscopus schwarzi", "raddes_boszanger"),

    # ===== ZUID-EUROPESE / MEDITERRANE SOORTEN =====
    ("Orpheusgrasmug", "Curruca hortensis", "orpheusgrasmug"),
    ("Sardijnse Grasmus", "Curruca melanocephala", "sardijnse_grasmus"),
    ("Balearische Grasmus", "Curruca subalpina", "balearische_grasmus"),
    ("Provencaalse Grasmus", "Curruca undata", "provencaalse_grasmus"),
    ("Brilgrasmus", "Curruca conspicillata", "brilgrasmus"),
    ("Sperwergrasmus", "Curruca nisoria", "sperwergrasmus"),
    ("Roodkopklauwier", "Lanius senator", "roodkopklauwier"),
    ("Kleine Klauwier", "Lanius minor", "kleine_klauwier"),
    ("Maskerklauwier", "Lanius nubicus", "maskerklauwier"),
    ("Kuifleeuwerik", "Galerida cristata", "kuifleeuwerik"),
    ("Kortteenleeuwerik", "Calandrella brachydactyla", "kortteenleeuwerik"),
    ("Kalanderleeuwerik", "Melanocorypha calandra", "kalanderleeuwerik"),
    ("Kleine Trap", "Tetrax tetrax", "kleine_trap"),
    ("Grote Trap", "Otis tarda", "grote_trap"),
    ("Alpengierzwaluw", "Tachymarptis melba", "alpengierzwaluw"),
    ("Vale Gierzwaluw", "Apus pallidus", "vale_gierzwaluw"),
    ("Kleine Torenvalk", "Falco naumanni", "kleine_torenvalk"),
    ("Eleonora's Valk", "Falco eleonorae", "eleonoras_valk"),
    ("Slangenarend", "Circaetus gallicus", "slangenarend"),
    ("Dwergarend", "Hieraaetus pennatus", "dwergarend"),
    ("Steenarend", "Aquila chrysaetos", "steenarend"),
    ("Lammergier", "Gypaetus barbatus", "lammergier"),
    ("Vale Gier", "Gyps fulvus", "vale_gier"),
    ("Griel", "Burhinus oedicnemus", "griel"),
    ("Steltkluut", "Himantopus himantopus", "steltkluut"),
    ("Vorkstaartplevier", "Glareola pratincola", "vorkstaartplevier"),
    ("Dwergooruil", "Otus scops", "dwergooruil"),
    ("Rotslijster", "Monticola saxatilis", "rotslijster"),
    ("Blauwe Rotslijster", "Monticola solitarius", "blauwe_rotslijster"),
    ("Isabeltapuit", "Oenanthe isabellina", "isabeltapuit"),
    ("Blonde Tapuit", "Oenanthe hispanica", "blonde_tapuit"),
    ("Rotszwaluw", "Ptyonoprogne rupestris", "rotszwaluw"),
    ("Roodstuitzwaluw", "Cecropis daurica", "roodstuitzwaluw"),
    ("Iberische Tjiftjaf", "Phylloscopus ibericus", "iberische_tjiftjaf"),
    ("Spaanse Mus", "Passer hispaniolensis", "spaanse_mus"),
    ("Rotsmus", "Petronia petronia", "rotsmus"),
    ("Cirlgors", "Emberiza cirlus", "cirlgors"),
    ("Rotsgors", "Emberiza cia", "rotsgors"),
    ("Ortolaan", "Emberiza hortulana", "ortolaan"),
    ("Purperreiger", "Ardea purpurea", "purperreiger"),
    ("Ralreiger", "Ardeola ralloides", "ralreiger"),
    ("Koereiger", "Bubulcus ibis", "koereiger"),
    ("Woudaap", "Ixobrychus minutus", "woudaap"),
    ("Zwarte Ooievaar", "Ciconia nigra", "zwarte_ooievaar"),
    ("Lepelaar", "Platalea leucorodia", "lepelaar"),
    ("Zwarte Ibis", "Plegadis falcinellus", "zwarte_ibis"),
    ("Purperkoet", "Porphyrio porphyrio", "purperkoet"),

    # ===== OOST-EUROPESE SOORTEN =====
    ("Kleine Schreeuwarend", "Clanga pomarina", "kleine_schreeuwarend"),
    ("Grote Schreeuwarend", "Clanga clanga", "grote_schreeuwarend"),
    ("Keizerarend", "Aquila heliaca", "keizerarend"),
    ("Sakervalk", "Falco cherrug", "sakervalk"),
    ("Roodpootvalk", "Falco vespertinus", "roodpootvalk"),
    ("Syrische Bonte Specht", "Dendrocopos syriacus", "syrische_bonte_specht"),
    ("Krekelzanger", "Locustella fluviatilis", "krekelzanger"),
    ("Struikrietzanger", "Acrocephalus dumetorum", "struikrietzanger"),
    ("Veldrietzanger", "Acrocephalus agricola", "veldrietzanger"),
    ("Waterrietzanger", "Acrocephalus paludicola", "waterrietzanger"),
    ("Snorborstzanger", "Acrocephalus melanopogon", "snorborstzanger"),
    ("Poelruiter", "Tringa stagnatilis", "poelruiter"),
    ("Terekruiter", "Xenus cinereus", "terekruiter"),
    ("Grote Snip", "Gallinago media", "grote_snip"),
    ("Roze Spreeuw", "Pastor roseus", "roze_spreeuw"),
    ("Izabelklauwier", "Lanius isabellinus", "izabelklauwier"),
    ("Bruine Klauwier", "Lanius cristatus", "bruine_klauwier"),
    ("Citroenkwikstaart", "Motacilla citreola", "citroenkwikstaart"),
    ("Zwartkeellijster", "Turdus atrogularis", "zwartkeellijster"),
    ("Bruine Lijster", "Turdus eunomus", "bruine_lijster"),
    ("Vale Lijster", "Turdus obscurus", "vale_lijster"),
    ("Groene Fitis", "Phylloscopus trochiloides", "groene_fitis"),
    ("Geelbrauwfitis", "Phylloscopus inornatus", "geelbrauwfitis"),
    ("Humes Bladfitis", "Phylloscopus humei", "humes_bladfitis"),
    ("Bonelli's Fitis", "Phylloscopus bonelli", "bonellis_fitis"),
    ("Roodmus", "Carpodacus erythrinus", "roodmus"),
    ("Frater", "Linaria flavirostris", "frater"),
    ("Siberische Roodborsttapuit", "Saxicola maurus", "siberische_roodborsttapuit"),

    # ===== TREKVOGELS / DOORTREKKERS / NOCMIG =====
    ("Kemphaan", "Calidris pugnax", "kemphaan"),
    ("Krombekstrandloper", "Calidris ferruginea", "krombekstrandloper"),
    ("Temmincks Strandloper", "Calidris temminckii", "temmincks_strandloper"),
    ("Paarse Strandloper", "Calidris maritima", "paarse_strandloper"),
    ("Morinelplevier", "Eudromias morinellus", "morinelplevier"),
    ("Strandplevier", "Anarhynchus alexandrinus", "strandplevier"),
    ("Kleine Plevier", "Charadrius dubius", "kleine_plevier"),
    ("Zilverplevier", "Pluvialis squatarola", "zilverplevier"),
    ("Zwarte Stern", "Chlidonias niger", "zwarte_stern"),
    ("Witvleugelstern", "Chlidonias leucopterus", "witvleugelstern"),
    ("Witwangstern", "Chlidonias hybrida", "witwangstern"),
    ("Grote Stern", "Thalasseus sandvicensis", "grote_stern"),
    ("Noordse Stern", "Sterna paradisaea", "noordse_stern"),
    ("Reuzenstern", "Hydroprogne caspia", "reuzenstern"),
    ("Lachstern", "Gelochelidon nilotica", "lachstern"),
    ("Visarend", "Pandion haliaetus", "visarend"),
    ("Zwarte Wouw", "Milvus migrans", "zwarte_wouw"),
    ("Wespendief", "Pernis apivorus", "wespendief"),
    ("Grauwe Franjepoot", "Phalaropus lobatus", "grauwe_franjepoot"),
    ("Rosse Franjepoot", "Phalaropus fulicarius", "rosse_franjepoot"),
    ("Duinpieper", "Anthus campestris", "duinpieper"),
    ("Waterpieper", "Anthus spinoletta", "waterpieper"),
    ("Oeverpieper", "Anthus petrosus", "oeverpieper"),
    ("Grote Pieper", "Anthus richardi", "grote_pieper"),
    ("Klein Waterhoen", "Zapornia parva", "klein_waterhoen"),
    ("Kleinst Waterhoen", "Zapornia pusilla", "kleinst_waterhoen"),
    ("Alpenheggenmus", "Prunella collaris", "alpenheggenmus"),
    ("Waterspreeuw", "Cinclus cinclus", "waterspreeuw"),
    ("Gekuifde Koekoek", "Clamator glandarius", "gekuifde_koekoek"),

    # ===== PELAGISCHE / NOORDZEE / KUST SOORTEN =====
    ("Noordse Stormvogel", "Fulmarus glacialis", "noordse_stormvogel"),
    ("Stormvogeltje", "Hydrobates pelagicus", "stormvogeltje"),
    ("Grauwe Pijlstormvogel", "Ardenna grisea", "grauwe_pijlstormvogel"),
    ("Noordse Pijlstormvogel", "Puffinus puffinus", "noordse_pijlstormvogel"),
    ("Jan-van-gent", "Morus bassanus", "jan_van_gent"),
    ("Papegaaiduiker", "Fratercula arctica", "papegaaiduiker"),
    ("Alk", "Alca torda", "alk"),
    ("Kleine Alk", "Alle alle", "kleine_alk"),
    ("Zeekoet", "Uria aalge", "zeekoet"),
    ("Zwarte Zeekoet", "Cepphus grylle", "zwarte_zeekoet"),
    ("Drieteenmeeuw", "Rissa tridactyla", "drieteenmeeuw"),
    ("Kleine Meeuw", "Hydrocoloeus minutus", "kleine_meeuw"),
    ("Zwartkopmeeuw", "Ichthyaetus melanocephalus", "zwartkopmeeuw"),
    ("Pontische Meeuw", "Larus cachinnans", "pontische_meeuw"),
    ("Grote Burgemeester", "Larus hyperboreus", "grote_burgemeester"),
    ("Kleine Burgemeester", "Larus glaucoides", "kleine_burgemeester"),
    ("Grote Jager", "Stercorarius skua", "grote_jager"),
    ("Kleine Jager", "Stercorarius parasiticus", "kleine_jager"),
    ("Kleinste Jager", "Stercorarius longicaudus", "kleinste_jager"),
    ("Middelste Jager", "Stercorarius pomarinus", "middelste_jager"),
    ("Zwarte Zee-eend", "Melanitta nigra", "zwarte_zee_eend"),
    ("Grote Zee-eend", "Melanitta fusca", "grote_zee_eend"),
    ("IJseend", "Clangula hyemalis", "ijseend"),
    ("Toppereend", "Aythya marila", "toppereend"),
    ("Krooneend", "Netta rufina", "krooneend"),
    ("Witoogeend", "Aythya nyroca", "witoogeend"),
    ("Parelduiker", "Gavia arctica", "parelduiker"),
    ("IJsduiker", "Gavia immer", "ijsduiker"),
    ("Kuifduiker", "Podiceps auritus", "kuifduiker"),
    ("Geoorde Fuut", "Podiceps nigricollis", "geoorde_fuut"),
    ("Rotgans", "Branta bernicla", "rotgans"),
    ("Zomertaling", "Spatula querquedula", "zomertaling"),
    ("Taigarietgans", "Anser fabalis", "taigarietgans"),
]

# Deduplicatie op directory naam
_seen = set()
_deduped = []
for species in ALL_SPECIES:
    key = species[2]  # directory naam
    if key not in _seen:
        _seen.add(key)
        _deduped.append(species)
ALL_SPECIES = sorted(_deduped, key=lambda x: x[0])  # Sorteer op Nederlandse naam

print(f"Te trainen: {len(ALL_SPECIES)} soorten x max {len(VOCALIZATION_TYPES)} types")
print(f"\nCategorieën:")
print(f"  Nederlandse broedvogels & vaste gasten")
print(f"  Scandinavische / boreale soorten")
print(f"  Zuid-Europese / mediterrane soorten")
print(f"  Oost-Europese soorten")
print(f"  Trekvogels / doortrekkers / nocmig")
print(f"  Pelagische / Noordzee / kust soorten")

In [None]:
# @title 5. Xeno-Canto API v3 + Parallel Downloads
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm


def search_xeno_canto(scientific_name: str, voc_type: str, max_results: int = 100) -> list:
    """Zoek opnames op Xeno-Canto API v3.

    API v3 gebruikt uitsluitend tags - geen vrije tekst queries.
    Kwaliteitsfilter via post-processing (q>:C syntax bestaat niet in v3).
    """
    parts = scientific_name.split()
    if len(parts) < 2:
        return []

    genus, species = parts[0].lower(), parts[1].lower()

    # Multi-word types moeten URL-encoded quotes hebben
    if ' ' in voc_type:
        type_query = f'type:"{voc_type}"'
    else:
        type_query = f'type:{voc_type}'

    # API v3: alleen tags, geen q>:C syntax
    query = f'gen:{genus} sp:{species} {type_query}'
    url = f'https://xeno-canto.org/api/3/recordings?query={query}&key={XC_API_KEY}'

    # Retry met exponential backoff (geen recursie!)
    for attempt in range(4):
        try:
            response = requests.get(url, timeout=30)
            if response.status_code == 200:
                data = response.json()
                # Check voor API errors
                if 'error' in data:
                    print(f"API error: {data.get('message', '')[:60]}", end=' ')
                    return []
                recordings = data.get('recordings', [])
                # Filter en sorteer op kwaliteit (A en B eerst, C+ weg)
                quality_order = {'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4}
                recordings = [r for r in recordings if r.get('q', 'E') in ('A', 'B', 'C')]
                recordings.sort(key=lambda r: quality_order.get(r.get('q', 'E'), 5))
                return recordings[:max_results]
            elif response.status_code == 429:  # Rate limit
                wait = 5 * (attempt + 1)
                print(f"    Rate limit, wacht {wait}s...", end=' ')
                time.sleep(wait)
                continue
            else:
                return []
        except requests.exceptions.Timeout:
            time.sleep(2)
            continue
        except Exception:
            return []

    return []  # Alle retries gefaald


def download_single(args):
    """Download een enkele opname."""
    recording, output_dir = args
    xc_id = recording['id']
    file_url = recording.get('file', '')

    if not file_url:
        return None

    if file_url.startswith('//'):
        file_url = 'https:' + file_url
    elif not file_url.startswith('http'):
        file_url = 'https://xeno-canto.org' + file_url

    output_path = output_dir / f"XC{xc_id}.mp3"

    if output_path.exists():
        return output_path

    try:
        response = requests.get(file_url, timeout=60)
        if response.status_code == 200 and len(response.content) > 1000:
            with open(output_path, 'wb') as f:
                f.write(response.content)
            return output_path
    except Exception:
        pass
    return None


def download_recordings_parallel(recordings, output_dir, max_workers=12):
    """Download opnames parallel."""
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    downloaded = []
    args_list = [(rec, output_dir) for rec in recordings]

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(download_single, args): args[0]['id'] for args in args_list}
        for future in as_completed(futures):
            result = future.result()
            if result:
                downloaded.append(result)

    return downloaded


print(f"Xeno-Canto API v3 klaar (key via Colab Secret)")
print(f"  Kwaliteitsfilter: A, B, C (post-processing)")

In [None]:
# @title 6. Audio Processing + Agressieve Augmentatie
import librosa
from concurrent.futures import ThreadPoolExecutor
from functools import partial


def generate_pink_noise(length: int) -> np.ndarray:
    """Genereer pink noise (1/f) - realistischer dan white noise."""
    uneven = length % 2
    X = np.random.randn(length // 2 + 1 + uneven) + 1j * np.random.randn(length // 2 + 1 + uneven)
    S = np.sqrt(np.arange(len(X)) + 1.0)
    y = (np.fft.irfft(X / S)).real[:length]
    return y / (np.abs(y).max() + 1e-8)


def augment_audio_aggressive(audio: np.ndarray, sr: int) -> list:
    """Agressieve audio augmentatie voor maximale variatie.

    Genereert meerdere augmented versies:
    1. Origineel
    2-3. Pitch shift (+/- random semitones)
    4-5. Time stretch (sneller/langzamer)
    6. Gaussian noise
    7. Pink noise (realistischer dan gaussian)
    8. Pitch + noise combo
    """
    augmented = [audio.copy()]  # Origineel altijd
    target_len = len(audio)

    def _pad_or_trim(a):
        if len(a) > target_len:
            return a[:target_len]
        elif len(a) < target_len:
            return np.pad(a, (0, target_len - len(a)))
        return a

    # Pitch shift (random +/- AUG_PITCH_RANGE semitones)
    for _ in range(2):
        try:
            steps = np.random.uniform(-AUG_PITCH_RANGE, AUG_PITCH_RANGE)
            shifted = librosa.effects.pitch_shift(audio, sr=sr, n_steps=steps)
            augmented.append(shifted)
        except Exception:
            pass

    # Time stretch
    for rate in [1.0 - AUG_STRETCH_RANGE, 1.0 + AUG_STRETCH_RANGE]:
        try:
            stretched = librosa.effects.time_stretch(audio, rate=rate)
            augmented.append(_pad_or_trim(stretched))
        except Exception:
            pass

    # Gaussian noise (random level)
    noise_level = np.random.choice(AUG_NOISE_LEVELS)
    noise = np.random.normal(0, noise_level, target_len)
    augmented.append(audio + noise)

    # Pink noise (meer realistisch: natuur achtergrondgeluid)
    pink = generate_pink_noise(target_len) * AUG_PINK_NOISE_LEVEL
    augmented.append(audio + pink)

    # Combo: pitch shift + noise
    try:
        steps = np.random.uniform(-2, 2)
        combo = librosa.effects.pitch_shift(audio, sr=sr, n_steps=steps)
        combo = combo + np.random.normal(0, 0.003, len(combo))
        augmented.append(_pad_or_trim(combo))
    except Exception:
        pass

    return augmented


def audio_to_spectrogram(audio: np.ndarray, sr: int = SAMPLE_RATE) -> np.ndarray:
    """Converteer audio naar mel-spectrogram met SpecAugment."""
    mel_spec = librosa.feature.melspectrogram(
        y=audio, sr=sr,
        n_mels=N_MELS, n_fft=N_FFT, hop_length=HOP_LENGTH,
        fmin=FMIN, fmax=FMAX
    )
    mel_db = librosa.power_to_db(mel_spec, ref=np.max)

    # Normaliseer naar [0, 1]
    mel_range = mel_db.max() - mel_db.min()
    if mel_range > 0:
        mel_norm = (mel_db - mel_db.min()) / mel_range
    else:
        mel_norm = np.zeros_like(mel_db)

    # Resize naar 128x128
    if mel_norm.shape != (128, 128):
        from skimage.transform import resize
        mel_norm = resize(mel_norm, (128, 128), anti_aliasing=True)

    return mel_norm


def apply_spec_augment(spec: np.ndarray) -> np.ndarray:
    """SpecAugment: mask random frequentie- en tijdbanden."""
    spec = spec.copy()
    n_freq, n_time = spec.shape

    # Frequentie masking (1-2 banden)
    for _ in range(np.random.randint(1, 3)):
        f_width = np.random.randint(1, AUG_SPEC_FREQ_MASK)
        f_start = np.random.randint(0, max(1, n_freq - f_width))
        spec[f_start:f_start + f_width, :] = 0

    # Tijd masking (1-2 banden)
    for _ in range(np.random.randint(1, 3)):
        t_width = np.random.randint(1, AUG_SPEC_TIME_MASK)
        t_start = np.random.randint(0, max(1, n_time - t_width))
        spec[:, t_start:t_start + t_width] = 0

    return spec


def process_single_audio(audio_path, max_segments=5, augment=True):
    """Verwerk audio naar augmented spectrogrammen."""
    try:
        audio, sr = librosa.load(str(audio_path), sr=SAMPLE_RATE, mono=True)
    except Exception:
        return []

    segment_samples = int(SEGMENT_DURATION * SAMPLE_RATE)
    spectrograms = []

    # Splits audio in segmenten
    segments = []
    for i in range(0, len(audio), segment_samples):
        segment = audio[i:i + segment_samples]
        if len(segment) < segment_samples // 2:
            continue
        if len(segment) < segment_samples:
            segment = np.pad(segment, (0, segment_samples - len(segment)))
        segments.append(segment)
        if len(segments) >= max_segments:
            break

    for segment in segments:
        if augment:
            aug_versions = augment_audio_aggressive(segment, SAMPLE_RATE)
            for aug in aug_versions:
                spec = audio_to_spectrogram(aug)
                # 50% kans op SpecAugment
                if np.random.random() > 0.5:
                    spec = apply_spec_augment(spec)
                spectrograms.append(spec)
        else:
            spec = audio_to_spectrogram(segment)
            spectrograms.append(spec)

    return spectrograms


def process_audio_files_parallel(audio_paths, max_segments=5, max_workers=4, augment=True):
    """Verwerk meerdere audio bestanden parallel."""
    all_specs = []
    func = partial(process_single_audio, max_segments=max_segments, augment=augment)

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        results = list(executor.map(func, audio_paths))

    for specs in results:
        all_specs.extend(specs)

    return all_specs


print(f"Audio processing klaar")
print(f"  Augmentatie: pitch({AUG_PITCH_RANGE}st), stretch({AUG_STRETCH_RANGE}), gaussian+pink noise, SpecAugment")
print(f"  ~8 augmented versies per segment")

In [None]:
# @title 7. Ultimate CNN Model (4 conv blokken, compatible met bestaande modellen)
import torch.nn as nn
from torch.amp import GradScaler, autocast


class UltimateVocalizationCNN(nn.Module):
    """Ultimate 4-blok CNN - compatibel met bestaande EMSN inference code.

    Architectuur: 4 conv blokken (32->64->128->256) + classifier (512->256->N)
    Input: 1x128x128 mel spectrogram
    """

    def __init__(self, input_shape=(128, 128), num_classes=10):
        super().__init__()

        self.features = nn.Sequential(
            # Blok 1: 128x128 -> 64x64
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Dropout2d(0.2),

            # Blok 2: 64x64 -> 32x32
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Dropout2d(0.2),

            # Blok 3: 32x32 -> 16x16
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Dropout2d(0.25),

            # Blok 4: 16x16 -> 8x8
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Dropout2d(0.25),
        )

        h, w = input_shape[0] // 16, input_shape[1] // 16  # 8x8
        flatten_size = 256 * h * w  # 16384

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(flatten_size, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(512, 256),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Linear(256, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x


print(f"UltimateVocalizationCNN klaar")
test_model = UltimateVocalizationCNN(num_classes=10).to(device)
params = sum(p.numel() for p in test_model.parameters())
print(f"  Parameters: {params:,}")
del test_model

In [None]:
# @title 8. Training Pipeline per Soort
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import LambdaLR


def integrate_retraining_data(dirname: str, X_all: list, y_all: list,
                               available_types: dict) -> tuple:
    """Integreer eigen review-data van HiDrive in de training dataset.

    Retraining bestanden worden verwacht in subfolders per type:
    retraining/{dirname}/{type_name}/*.mp3
    """
    retraining_dir = LOCAL_RETRAINING / dirname
    if not retraining_dir.exists():
        return X_all, y_all, available_types

    added = 0
    for type_name, type_info in VOCALIZATION_TYPES.items():
        type_dir = retraining_dir / type_name
        if not type_dir.exists():
            continue

        label = type_info['label']
        audio_files = list(type_dir.glob('*.mp3')) + list(type_dir.glob('*.wav'))
        if not audio_files:
            continue

        # Verwerk zonder augmentatie (eigen data is al geverifieerd)
        specs = process_audio_files_parallel(
            audio_files, max_segments=MAX_SEGMENTS_PER_RECORDING,
            max_workers=NUM_WORKERS, augment=False,
        )

        for spec in specs:
            X_all.append(spec)
            y_all.append(label)
            added += 1

        if label not in available_types:
            available_types[label] = type_name

    if added > 0:
        print(f"  +{added} eigen review-specs geintegreerd")

    return X_all, y_all, available_types


def train_species(dutch_name: str, scientific_name: str, dirname: str) -> tuple:
    """Train een 10-type model voor een soort.

    Per soort worden alleen types getraind waarvoor voldoende data is.
    Niet elke soort heeft alle 10 types.
    """
    print(f"\n{'='*60}")
    print(f"  {dutch_name} ({scientific_name})")
    print(f"{'='*60}")

    start_time = time.time()
    audio_dir = LOCAL_AUDIO / dirname

    X_all, y_all = [], []
    available_types = {}  # label -> type_name

    # Download en verwerk per type
    for type_name, type_info in VOCALIZATION_TYPES.items():
        xc_query = type_info['xc_query']
        label = type_info['label']

        print(f"  [{label}] {type_name} (XC: {xc_query})...", end=' ')

        # Zoek op Xeno-Canto
        recordings = search_xeno_canto(
            scientific_name, xc_query, max_results=MAX_RECORDINGS_PER_TYPE
        )

        if not recordings:
            print("0 gevonden")
            continue

        # Download audio
        type_dir = audio_dir / xc_query.replace(' ', '_')
        audio_files = download_recordings_parallel(
            recordings[:MAX_RECORDINGS_PER_TYPE],
            type_dir,
            max_workers=MAX_CONCURRENT_DOWNLOADS,
        )
        print(f"{len(audio_files)} files", end=' ')

        if not audio_files:
            print()
            continue

        # Genereer spectrogrammen met augmentatie
        specs = process_audio_files_parallel(
            audio_files,
            max_segments=MAX_SEGMENTS_PER_RECORDING,
            max_workers=NUM_WORKERS,
            augment=True,
        )

        if len(specs) >= MIN_SAMPLES_PER_TYPE:
            for spec in specs:
                X_all.append(spec)
                y_all.append(label)
            available_types[label] = type_name
            print(f"-> {len(specs)} specs")
        else:
            print(f"-> {len(specs)} specs (< {MIN_SAMPLES_PER_TYPE}, overgeslagen)")

    # Integreer eigen review-data (als beschikbaar)
    X_all, y_all, available_types = integrate_retraining_data(
        dirname, X_all, y_all, available_types
    )

    # Check of er genoeg types zijn
    if len(available_types) < MIN_TYPES_FOR_TRAINING:
        print(f"  Te weinig types ({len(available_types)} < {MIN_TYPES_FOR_TRAINING})")
        return None, 'insufficient_types'

    X = np.array(X_all)
    y = np.array(y_all)

    # Remap labels naar 0..N-1
    unique_labels = sorted(available_types.keys())
    num_classes = len(unique_labels)
    label_map = {old: new for new, old in enumerate(unique_labels)}
    reverse_map = {new: old for old, new in label_map.items()}
    y_remapped = np.array([label_map[l] for l in y])
    class_names = [available_types[unique_labels[i]] for i in range(num_classes)]

    # Toon verdeling
    unique, counts = np.unique(y_remapped, return_counts=True)
    print(f"\n  Types: {num_classes} | Totaal: {len(X)} specs")
    for i, (u, c) in enumerate(zip(unique, counts)):
        print(f"    [{u}] {class_names[u]}: {c}")

    # Train/val split (stratified)
    X_train, X_val, y_train, y_val = train_test_split(
        X, y_remapped, test_size=0.2, random_state=42, stratify=y_remapped
    )

    # Class weights voor imbalance
    from collections import Counter
    class_counts = Counter(y_train)
    total_train = len(y_train)
    class_weights = torch.tensor(
        [total_train / (num_classes * class_counts[i]) for i in range(num_classes)],
        dtype=torch.float32
    ).to(device)

    # DataLoaders (drop_last alleen als dataset groter is dan batch)
    train_dataset = TensorDataset(
        torch.FloatTensor(X_train).unsqueeze(1),
        torch.LongTensor(y_train),
    )
    train_loader = DataLoader(
        train_dataset,
        batch_size=BATCH_SIZE, shuffle=True,
        num_workers=NUM_WORKERS, pin_memory=True,
        drop_last=(len(train_dataset) > BATCH_SIZE),
    )
    val_loader = DataLoader(
        TensorDataset(
            torch.FloatTensor(X_val).unsqueeze(1),
            torch.LongTensor(y_val),
        ),
        batch_size=BATCH_SIZE,
        num_workers=NUM_WORKERS, pin_memory=True,
    )

    # Model
    model = UltimateVocalizationCNN(num_classes=num_classes).to(device)
    criterion = nn.CrossEntropyLoss(
        weight=class_weights, label_smoothing=LABEL_SMOOTHING
    )
    optimizer = torch.optim.AdamW(
        model.parameters(), lr=LEARNING_RATE, weight_decay=WEIGHT_DECAY
    )

    # Cosine annealing met warmup
    def lr_lambda(epoch):
        if epoch < WARMUP_EPOCHS:
            return (epoch + 1) / WARMUP_EPOCHS
        progress = (epoch - WARMUP_EPOCHS) / max(1, EPOCHS - WARMUP_EPOCHS)
        return max(MIN_LR / LEARNING_RATE, 0.5 * (1 + np.cos(np.pi * progress)))

    scheduler = LambdaLR(optimizer, lr_lambda)
    scaler = GradScaler("cuda")

    # Training loop
    best_acc = 0
    best_state = None
    patience_counter = 0

    for epoch in range(EPOCHS):
        model.train()
        train_loss = 0

        for X_batch, y_batch in train_loader:
            X_batch = X_batch.to(device, non_blocking=True)
            y_batch = y_batch.to(device, non_blocking=True)

            optimizer.zero_grad(set_to_none=True)

            with autocast("cuda", dtype=dtype):
                outputs = model(X_batch)
                loss = criterion(outputs, y_batch)

            scaler.scale(loss).backward()
            scaler.unscale_(optimizer)
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            scaler.step(optimizer)
            scaler.update()

            train_loss += loss.item()

        scheduler.step()

        # Validation
        model.eval()
        val_correct = 0
        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                X_batch = X_batch.to(device, non_blocking=True)
                y_batch = y_batch.to(device, non_blocking=True)
                with autocast("cuda", dtype=dtype):
                    outputs = model(X_batch)
                val_correct += (outputs.argmax(1) == y_batch).sum().item()

        val_acc = val_correct / len(y_val)

        if val_acc > best_acc:
            best_acc = val_acc
            best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
            patience_counter = 0
        else:
            patience_counter += 1

        if patience_counter >= PATIENCE:
            print(f"  Early stop @ epoch {epoch+1}")
            break

    if best_state is None:
        del model, train_loader, val_loader
        torch.cuda.empty_cache()
        gc.collect()
        return None, 'training_failed'

    # Save model
    model_path = LOCAL_MODELS / f"{dirname}_cnn_{VERSION}.pt"
    torch.save({
        'model_state_dict': best_state,
        'num_classes': num_classes,
        'class_names': class_names,
        'label_map': label_map,
        'reverse_map': reverse_map,
        'accuracy': best_acc,
        'species_name': dutch_name,
        'scientific_name': scientific_name,
        'version': VERSION,
        'architecture': 'UltimateVocalizationCNN',
        'vocalization_types': VOCALIZATION_TYPES,
        'available_types': available_types,
    }, model_path)

    del model, train_loader, val_loader
    torch.cuda.empty_cache()
    gc.collect()

    elapsed = time.time() - start_time
    print(f"\n  {model_path.name} | {num_classes} types | Acc: {best_acc:.1%} | {elapsed:.0f}s")

    return best_acc, 'success'


print("Training pipeline klaar")

In [None]:
# @title 9. START TRAINING (alle soorten, met auto-resume)
import pandas as pd
import io
import paramiko

# ── Resume detectie ──────────────────────────────────────
# Check welke modellen al bestaan (lokaal of eerder gedownload van HiDrive)
existing_models = {f.stem.replace(f'_cnn_{VERSION}', '') for f in LOCAL_MODELS.glob('*.pt')}

# Probeer ook progress CSV te laden (voor resultaten van eerdere run)
progress_csv = LOCAL_CHECKPOINT / 'training_progress.csv'
if progress_csv.exists():
    prev_results = pd.read_csv(progress_csv).to_dict('records')
    prev_dirnames = {r['dirname'] for r in prev_results}
    print(f"Vorige progress gevonden: {len(prev_results)} soorten")
else:
    prev_results = []
    prev_dirnames = set()

# Download eerder getrainde modellen van HiDrive (als ze niet lokaal staan)
try:
    with paramiko.SSHClient() as ssh_client:
        ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        pkey = paramiko.Ed25519Key.from_private_key(io.StringIO(HIDRIVE_SSH_KEY))
        ssh_client.connect('sftp.hidrive.strato.com', username='ronnyclouddisk', pkey=pkey)
        with ssh_client.open_sftp() as sftp:
            try:
                remote_models = sftp.listdir(HIDRIVE_MODELS_DIR)
                remote_pt = [f for f in remote_models if f.endswith('.pt') and VERSION in f]
                for fname in remote_pt:
                    local_file = LOCAL_MODELS / fname
                    if not local_file.exists():
                        sftp.get(f"{HIDRIVE_MODELS_DIR}/{fname}", str(local_file))
                existing_models = {f.stem.replace(f'_cnn_{VERSION}', '') for f in LOCAL_MODELS.glob('*.pt')}
                print(f"HiDrive modellen gesynced: {len(existing_models)} totaal")
            except FileNotFoundError:
                print("Geen eerdere modellen op HiDrive")
            # Download progress CSV van HiDrive als die nieuwer is
            try:
                remote_csv = f"{HIDRIVE_CHECKPOINT_DIR}/training_progress.csv"
                local_tmp = LOCAL_CHECKPOINT / 'hidrive_progress.csv'
                sftp.get(remote_csv, str(local_tmp))
                hd_results = pd.read_csv(local_tmp).to_dict('records')
                if len(hd_results) > len(prev_results):
                    prev_results = hd_results
                    prev_dirnames = {r['dirname'] for r in prev_results}
                    print(f"HiDrive progress geladen: {len(prev_results)} soorten")
                local_tmp.unlink(missing_ok=True)
            except Exception:
                pass
except Exception as e:
    print(f"HiDrive check overgeslagen: {e}")

# ── Start training ───────────────────────────────────────
results = list(prev_results)  # Bouw voort op eerdere resultaten
start_all = time.time()

# Tel eerdere resultaten
skipped = 0
successful = sum(1 for r in prev_results if r.get('status') == 'success')
failed = sum(1 for r in prev_results if r.get('status') in ('error', 'training_failed', 'insufficient_types'))

print(f"\n{'='*60}")
print(f"EMSN VOCALISATIE 10-TYPES TRAINING")
print(f"{'='*60}")
print(f"Start: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Soorten: {len(ALL_SPECIES)} totaal")
if existing_models:
    print(f"Al getraind: {len(existing_models)} modellen (worden overgeslagen)")
print(f"GPU: {GPU_TIER} ({GPU_NAME})")
print(f"{'='*60}")

for i, (dutch, scientific, dirname) in enumerate(ALL_SPECIES):
    # Skip als model al bestaat
    if dirname in existing_models:
        skipped += 1
        continue

    # Skip als al in eerdere resultaten (bijv. insufficient_types)
    if dirname in prev_dirnames:
        skipped += 1
        continue

    try:
        acc, status = train_species(dutch, scientific, dirname)
        results.append({
            'species': dutch,
            'scientific': scientific,
            'dirname': dirname,
            'accuracy': acc,
            'status': status,
        })

        if status == 'success':
            successful += 1
        else:
            failed += 1

    except Exception as e:
        print(f"  Error: {str(e)[:80]}")
        results.append({
            'species': dutch,
            'scientific': scientific,
            'dirname': dirname,
            'accuracy': None,
            'status': 'error',
        })
        failed += 1
        torch.cuda.empty_cache()
        gc.collect()

    # Checkpoint elke 15 NIEUWE soorten
    new_count = successful + failed - sum(1 for r in prev_results if r.get('status') in ('success', 'error', 'training_failed', 'insufficient_types'))
    if new_count > 0 and new_count % 15 == 0:
        df = pd.DataFrame(results)
        csv_path = LOCAL_CHECKPOINT / 'training_progress.csv'
        df.to_csv(csv_path, index=False)

        elapsed = time.time() - start_all
        remaining = len(ALL_SPECIES) - skipped - successful - failed
        if new_count > 0:
            eta = (elapsed / new_count) * remaining
        else:
            eta = 0
        print(f"\n  [{successful + failed}/{len(ALL_SPECIES)}] {successful} OK / {failed} mislukt / {skipped} overgeslagen | ETA: {eta/60:.0f}min")

        # Upload tussentijdse modellen naar HiDrive
        try:
            for model_file in LOCAL_MODELS.glob('*.pt'):
                upload_to_hidrive(model_file, HIDRIVE_MODELS_DIR)
            upload_to_hidrive(csv_path, HIDRIVE_CHECKPOINT_DIR)
            print(f"  HiDrive checkpoint uploaded")
        except Exception as e:
            print(f"  HiDrive fout (niet fataal): {e}")

# Finale checkpoint opslaan
df = pd.DataFrame(results)
csv_path = LOCAL_CHECKPOINT / 'training_progress.csv'
df.to_csv(csv_path, index=False)

elapsed_all = time.time() - start_all
print(f"\n{'='*60}")
print(f"TRAINING VOLTOOID")
print(f"{'='*60}")
print(f"Tijd: {elapsed_all/3600:.1f} uur")
print(f"Succesvol: {successful}/{len(ALL_SPECIES)}")
print(f"Mislukt: {failed}/{len(ALL_SPECIES)}")
print(f"Overgeslagen (al getraind): {skipped}")

In [None]:
# @title 10. Resultaten & Analyse
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(results)
df.to_csv(LOCAL_CHECKPOINT / 'results_10types.csv', index=False)

ok = df[df['status'] == 'success']

print(f"\n{'='*60}")
print(f"RESULTATEN")
print(f"{'='*60}")
print(f"Getraind: {len(ok)}/{len(df)}")

if len(ok) > 0:
    print(f"\nAccuracy:")
    print(f"  Gemiddeld: {ok['accuracy'].mean():.1%}")
    print(f"  Mediaan:   {ok['accuracy'].median():.1%}")
    print(f"  Min:       {ok['accuracy'].min():.1%}")
    print(f"  Max:       {ok['accuracy'].max():.1%}")

    # Top en bottom 10
    print(f"\nTop 10:")
    for _, row in ok.nlargest(10, 'accuracy').iterrows():
        print(f"  {row['accuracy']:.1%} - {row['species']}")

    print(f"\nBottom 10:")
    for _, row in ok.nsmallest(10, 'accuracy').iterrows():
        print(f"  {row['accuracy']:.1%} - {row['species']}")

    # Histogram
    fig, ax = plt.subplots(figsize=(10, 5))
    ax.hist(ok['accuracy'] * 100, bins=20, edgecolor='black', alpha=0.7, color='#7e57c2')
    ax.axvline(ok['accuracy'].mean() * 100, color='red', linestyle='--', label=f"Gemiddeld: {ok['accuracy'].mean():.1%}")
    ax.set_xlabel('Accuracy (%)')
    ax.set_ylabel('Aantal soorten')
    ax.set_title('EMSN Vocalisatie 10-Types - Accuracy Verdeling')
    ax.legend()
    plt.tight_layout()
    hist_path = LOCAL_CHECKPOINT / 'accuracy_histogram.png'
    plt.savefig(hist_path, dpi=150)
    plt.show()

# Mislukte soorten
nok = df[df['status'] != 'success']
if len(nok) > 0:
    print(f"\nMislukt ({len(nok)}):")
    for _, row in nok.iterrows():
        print(f"  {row['species']}: {row['status']}")

In [None]:
# @title 11. Upload ALLE modellen naar HiDrive

models = sorted(LOCAL_MODELS.glob('*.pt'))

print(f"{'='*60}")
print(f"FINALE UPLOAD NAAR HIDRIVE")
print(f"{'='*60}")
print(f"Modellen: {len(models)}")

if models:
    total_size = sum(m.stat().st_size for m in models) / 1e6
    print(f"Totale grootte: {total_size:.0f} MB")

    for model_file in tqdm(models, desc="Uploaden"):
        try:
            upload_to_hidrive(model_file, HIDRIVE_MODELS_DIR)
        except Exception as e:
            print(f"  Fout bij {model_file.name}: {e}")

    # Upload resultaten CSV
    csv_path = LOCAL_CHECKPOINT / 'training_progress.csv'
    if csv_path.exists():
        upload_to_hidrive(csv_path, HIDRIVE_MODELS_DIR)

    print(f"\nAlles uploaded naar HiDrive: {HIDRIVE_MODELS_DIR}")

print(f"""
{'='*60}
INSTALLATIE OP PI
{'='*60}

1. Download modellen van HiDrive:

   rclone sync hidrive:users/ronnyclouddisk/emsn-backups/vocalization-models-10types/ \\
     /mnt/nas-docker/emsn-vocalization/data/models/ --progress

2. Backup oude modellen:

   cp -r /mnt/nas-docker/emsn-vocalization/data/models/ \\
     /mnt/nas-birdnet-archive/getrainde_modellen_EMSN/backup_$(date +%Y%m%d)/

3. Herstart services:

   sudo systemctl restart vocalization-enricher.service
   sudo systemctl restart emsn-reports-api.service

{'='*60}
""")

## Samenvatting

### Voorbereiding (eenmalig)

**1. Colab Secret instellen:**
- Klik op het sleutel-icoon links in Colab
- Maak secret `HIDRIVE_SSH_KEY` aan
- Plak de inhoud van `~/.ssh/id_ed25519_hidrive`
- Zet "Notebook toegang" aan

**2. (Optioneel) Review-data uploaden naar HiDrive:**
```bash
rclone sync /mnt/nas-birdnet-archive/vocalization/retraining/ \
  hidrive:users/ronnyclouddisk/emsn-backups/vocalization-retraining/ --progress
```

### Na training: modellen installeren
```bash
# Download van HiDrive
rclone sync hidrive:users/ronnyclouddisk/emsn-backups/vocalization-models-10types/ \
  /mnt/nas-docker/emsn-vocalization/data/models/ --progress

# Herstart services
sudo systemctl restart vocalization-enricher.service
```