# üê¶ EMSN 2.0 - Vocalization Classifier Training

**Train vocalisatie classifiers voor alle 232 Nederlandse vogelsoorten met GPU.**

Dit notebook:
1. Download audio van Xeno-canto (per soort)
2. Genereert spectrogrammen
3. Traint CNN model met GPU
4. Slaat modellen op in Google Drive

**Geschatte tijd:** ~2-3 minuten per soort = ~10-12 uur totaal

---
¬© 2025 Ronny Hullegie - EMSN 2.0 Ecologisch Monitoring Systeem Nijverdal

## 1Ô∏è‚É£ Setup & GPU Check

In [None]:
# GPU Check
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv

import torch
print(f"\nPyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ö†Ô∏è Geen GPU! Ga naar Runtime > Change runtime type > GPU")

In [None]:
# Installeer dependencies
!pip install -q librosa soundfile scikit-learn requests tqdm matplotlib seaborn

import os
import json
import time
import requests
import numpy as np
import librosa
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from pathlib import Path
from tqdm.auto import tqdm
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Dependencies ge√Ønstalleerd")

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Werkdirectories
DRIVE_BASE = Path('/content/drive/MyDrive/EMSN-Vocalization')
MODELS_DIR = DRIVE_BASE / 'models'
LOGS_DIR = DRIVE_BASE / 'logs'
TEMP_DIR = Path('/content/temp_audio')

for d in [MODELS_DIR, LOGS_DIR, TEMP_DIR]:
    d.mkdir(parents=True, exist_ok=True)

print(f"‚úÖ Drive gemount")
print(f"   Models: {MODELS_DIR}")
print(f"   Logs: {LOGS_DIR}")

## 2Ô∏è‚É£ Configuratie

In [None]:
# === CONFIGURATIE ===

# Xeno-canto API key (v3 vereist authenticatie)
XENOCANTO_API_KEY = '14258afd1c8a8e055387d012f2620e20f59ef3a2'

# Training parameters
VERSION = '2025'           # Model versie (jaar)
EPOCHS = 25                # Training epochs
BATCH_SIZE = 64            # Batch size (hoger = sneller met GPU)
LEARNING_RATE = 0.001      # Learning rate
MIN_SAMPLES = 50           # Minimum spectrogrammen per soort (verlaagd)
MAX_RECORDINGS = 200       # Max recordings per soort van Xeno-canto

# Spectrogram parameters
SAMPLE_RATE = 22050
N_MELS = 128
HOP_LENGTH = 512
N_FFT = 2048
SPEC_WIDTH = 128           # Breedte spectrogram (tijdstappen)

# Device
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {DEVICE}")
print(f"API Key: {XENOCANTO_API_KEY[:8]}...")

In [None]:
# Nederlandse vogelsoorten (232 soorten)
# Wetenschappelijke naam -> Nederlandse naam

DUTCH_SPECIES = {
    # Meest voorkomende soorten eerst (prioriteit)
    "Parus major": "Koolmees",
    "Cyanistes caeruleus": "Pimpelmees",
    "Turdus merula": "Merel",
    "Erithacus rubecula": "Roodborst",
    "Fringilla coelebs": "Vink",
    "Troglodytes troglodytes": "Winterkoning",
    "Pica pica": "Ekster",
    "Columba palumbus": "Houtduif",
    "Turdus iliacus": "Koperwiek",
    "Corvus corone": "Zwarte kraai",
    "Passer domesticus": "Huismus",
    "Sturnus vulgaris": "Spreeuw",
    "Garrulus glandarius": "Gaai",
    "Dendrocopos major": "Grote bonte specht",
    
    # Mezen
    "Periparus ater": "Zwarte mees",
    "Lophophanes cristatus": "Kuifmees",
    "Poecile palustris": "Glanskop",
    "Poecile montanus": "Matkop",
    "Aegithalos caudatus": "Staartmees",
    
    # Lijsters
    "Turdus philomelos": "Zanglijster",
    "Turdus viscivorus": "Grote lijster",
    "Turdus pilaris": "Kramsvogel",
    
    # Vinken
    "Chloris chloris": "Groenling",
    "Carduelis carduelis": "Putter",
    "Spinus spinus": "Sijs",
    "Linaria cannabina": "Kneu",
    "Pyrrhula pyrrhula": "Goudvink",
    "Coccothraustes coccothraustes": "Appelvink",
    
    # Zangers
    "Phylloscopus collybita": "Tjiftjaf",
    "Phylloscopus trochilus": "Fitis",
    "Sylvia atricapilla": "Zwartkop",
    "Sylvia borin": "Tuinfluiter",
    "Sylvia communis": "Grasmus",
    "Sylvia curruca": "Braamsluiper",
    "Acrocephalus scirpaceus": "Kleine karekiet",
    "Acrocephalus schoenobaenus": "Rietzanger",
    "Hippolais icterina": "Spotvogel",
    "Locustella naevia": "Sprinkhaanzanger",
    "Regulus regulus": "Goudhaan",
    "Regulus ignicapilla": "Vuurgoudhaan",
    
    # Spechten
    "Dryobates minor": "Kleine bonte specht",
    "Picus viridis": "Groene specht",
    "Dryocopus martius": "Zwarte specht",
    
    # Duiven
    "Streptopelia decaocto": "Turkse tortel",
    "Streptopelia turtur": "Zomertortel",
    "Columba livia": "Stadsduif",
    "Columba oenas": "Holenduif",
    
    # Kraaien
    "Corvus frugilegus": "Roek",
    "Corvus monedula": "Kauw",
    "Corvus corax": "Raaf",
    
    # Reigers en watervogels
    "Ardea cinerea": "Blauwe reiger",
    "Egretta garzetta": "Kleine zilverreiger",
    "Ardea alba": "Grote zilverreiger",
    "Nycticorax nycticorax": "Kwak",
    "Botaurus stellaris": "Roerdomp",
    
    # Eenden
    "Anas platyrhynchos": "Wilde eend",
    "Anas crecca": "Wintertaling",
    "Anas strepera": "Krakeend",
    "Spatula clypeata": "Slobeend",
    "Mareca penelope": "Smient",
    "Aythya fuligula": "Kuifeend",
    "Aythya ferina": "Tafeleend",
    
    # Ganzen
    "Anser anser": "Grauwe gans",
    "Anser albifrons": "Kolgans",
    "Branta canadensis": "Canadese gans",
    "Branta leucopsis": "Brandgans",
    "Anser fabalis": "Rietgans",
    
    # Zwanen
    "Cygnus olor": "Knobbelzwaan",
    "Cygnus cygnus": "Wilde zwaan",
    "Cygnus columbianus": "Kleine zwaan",
    
    # Roofvogels
    "Buteo buteo": "Buizerd",
    "Accipiter nisus": "Sperwer",
    "Accipiter gentilis": "Havik",
    "Milvus milvus": "Rode wouw",
    "Circus aeruginosus": "Bruine kiekendief",
    "Falco tinnunculus": "Torenvalk",
    "Falco peregrinus": "Slechtvalk",
    
    # Uilen
    "Athene noctua": "Steenuil",
    "Strix aluco": "Bosuil",
    "Tyto alba": "Kerkuil",
    "Asio otus": "Ransuil",
    "Bubo bubo": "Oehoe",
    
    # Steltlopers
    "Vanellus vanellus": "Kievit",
    "Limosa limosa": "Grutto",
    "Numenius arquata": "Wulp",
    "Tringa totanus": "Tureluur",
    "Gallinago gallinago": "Watersnip",
    "Haematopus ostralegus": "Scholekster",
    "Recurvirostra avosetta": "Kluut",
    "Charadrius hiaticula": "Bontbekplevier",
    "Pluvialis apricaria": "Goudplevier",
    
    # Meeuwen en sterns
    "Larus ridibundus": "Kokmeeuw",
    "Larus canus": "Stormmeeuw",
    "Larus argentatus": "Zilvermeeuw",
    "Larus fuscus": "Kleine mantelmeeuw",
    "Larus marinus": "Grote mantelmeeuw",
    "Sterna hirundo": "Visdief",
    "Sternula albifrons": "Dwergstern",
    
    # Rallen
    "Fulica atra": "Meerkoet",
    "Gallinula chloropus": "Waterhoen",
    "Rallus aquaticus": "Waterral",
    "Porzana porzana": "Porseleinhoen",
    
    # Overige
    "Alcedo atthis": "IJsvogel",
    "Upupa epops": "Hop",
    "Cuculus canorus": "Koekoek",
    "Apus apus": "Gierzwaluw",
    "Hirundo rustica": "Boerenzwaluw",
    "Delichon urbicum": "Huiszwaluw",
    "Riparia riparia": "Oeverzwaluw",
    "Motacilla alba": "Witte kwikstaart",
    "Motacilla flava": "Gele kwikstaart",
    "Anthus pratensis": "Graspieper",
    "Anthus trivialis": "Boompieper",
    "Alauda arvensis": "Veldleeuwerik",
    "Lullula arborea": "Boomleeuwerik",
    "Emberiza citrinella": "Geelgors",
    "Emberiza schoeniclus": "Rietgors",
    "Prunella modularis": "Heggenmus",
    "Sitta europaea": "Boomklever",
    "Certhia brachydactyla": "Boomkruiper",
    "Oriolus oriolus": "Wielewaal",
    "Lanius collurio": "Grauwe klauwier",
    "Lanius excubitor": "Klapekster",
    "Muscicapa striata": "Grauwe vliegenvanger",
    "Ficedula hypoleuca": "Bonte vliegenvanger",
    "Phoenicurus ochruros": "Zwarte roodstaart",
    "Phoenicurus phoenicurus": "Gekraagde roodstaart",
    "Saxicola rubicola": "Roodborsttapuit",
    "Oenanthe oenanthe": "Tapuit",
    "Luscinia megarhynchos": "Nachtegaal",
    "Luscinia svecica": "Blauwborst",
    
    # Minder algemeen
    "Phasianus colchicus": "Fazant",
    "Perdix perdix": "Patrijs",
    "Coturnix coturnix": "Kwartel",
    "Scolopax rusticola": "Houtsnip",
    "Caprimulgus europaeus": "Nachtzwaluw",
    "Jynx torquilla": "Draaihals",
    "Merops apiaster": "Bijeneter",
    "Coracias garrulus": "Scharrelaar",
    "Remiz pendulinus": "Buidelmees",
    "Panurus biarmicus": "Baardman",
    "Acrocephalus arundinaceus": "Grote karekiet",
    "Locustella luscinioides": "Snor",
    "Cettia cetti": "Cetti's zanger",
    "Cisticola juncidis": "Graszanger",
    "Carduelis citrinella": "Citroenkanarie",
    "Loxia curvirostra": "Kruisbek",
    "Pinicola enucleator": "Haakbek",
    
    # Zeldzamer
    "Ciconia ciconia": "Ooievaar",
    "Ciconia nigra": "Zwarte ooievaar",
    "Platalea leucorodia": "Lepelaar",
    "Phalacrocorax carbo": "Aalscholver",
    "Podiceps cristatus": "Fuut",
    "Tachybaptus ruficollis": "Dodaars",
    "Grus grus": "Kraanvogel",
    "Crex crex": "Kwartelkoning",
}

print(f"Totaal: {len(DUTCH_SPECIES)} soorten geladen")

## 3Ô∏è‚É£ Model Architectuur

In [None]:
class VocalizationCNN(nn.Module):
    """CNN voor vocalisatie classificatie (song/call/alarm)."""
    
    def __init__(self, input_shape=(N_MELS, SPEC_WIDTH), num_classes=3):
        super().__init__()
        
        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout2d(0.25),
            
            # Block 2
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout2d(0.25),
            
            # Block 3
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout2d(0.25),
        )
        
        # Calculate flatten size
        h, w = input_shape[0] // 8, input_shape[1] // 8
        flatten_size = 128 * h * w
        
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(flatten_size, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

# Test
model = VocalizationCNN().to(DEVICE)
test_input = torch.randn(1, 1, N_MELS, SPEC_WIDTH).to(DEVICE)
test_output = model(test_input)
print(f"‚úÖ Model OK - Input: {test_input.shape} -> Output: {test_output.shape}")
print(f"   Parameters: {sum(p.numel() for p in model.parameters()):,}")

## 4Ô∏è‚É£ Xeno-canto Data Functies

In [None]:
def search_xenocanto(scientific_name, max_results=MAX_RECORDINGS):
    """Zoek recordings op Xeno-canto API v3."""
    # API v3 format met species tag
    query = f'sp:"{scientific_name}"'
    url = f"https://xeno-canto.org/api/3/recordings?query={requests.utils.quote(query)}&key={XENOCANTO_API_KEY}&per_page=500"
    
    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        data = response.json()
        
        # Check voor errors
        if 'error' in data:
            print(f"  ‚ö†Ô∏è API error: {data.get('message', 'Unknown error')}")
            return {'song': [], 'call': [], 'alarm': []}
        
        recordings = data.get('recordings', [])
        
        if not recordings:
            print(f"  ‚ö†Ô∏è Geen recordings gevonden")
            return {'song': [], 'call': [], 'alarm': []}
        
        print(f"  üì° API response: {len(recordings)} recordings")
        
        # Filter en categoriseer op type
        categorized = {'song': [], 'call': [], 'alarm': []}
        
        for rec in recordings[:max_results * 2]:
            rec_type = rec.get('type', '').lower()
            
            # Categoriseer op basis van type veld
            if any(word in rec_type for word in ['song', 'zang', 'singing']):
                categorized['song'].append(rec)
            elif any(word in rec_type for word in ['alarm', 'warning', 'distress']):
                categorized['alarm'].append(rec)
            elif any(word in rec_type for word in ['call', 'roep', 'flight']):
                categorized['call'].append(rec)
            elif rec_type == '' or rec_type == 'unknown':
                # Onbekend type -> toewijzen aan song (meest voorkomend)
                categorized['song'].append(rec)
        
        return categorized
    except requests.exceptions.RequestException as e:
        print(f"  ‚ö†Ô∏è Request error: {e}")
        return {'song': [], 'call': [], 'alarm': []}
    except Exception as e:
        print(f"  ‚ö†Ô∏è Error: {e}")
        return {'song': [], 'call': [], 'alarm': []}


def download_audio(url, filepath):
    """Download audio bestand."""
    try:
        # Fix URL format
        if url.startswith('//'):
            url = 'https:' + url
        url = url.replace('http://', 'https://')
        
        response = requests.get(url, timeout=60, stream=True)
        response.raise_for_status()
        
        with open(filepath, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        return True
    except Exception as e:
        return False


def audio_to_spectrogram(audio_path, target_shape=(N_MELS, SPEC_WIDTH)):
    """Converteer audio naar mel-spectrogram."""
    try:
        # Laad audio
        y, sr = librosa.load(audio_path, sr=SAMPLE_RATE, duration=10)
        
        if len(y) < sr:  # Minder dan 1 seconde
            return None
        
        # Mel spectrogram
        mel_spec = librosa.feature.melspectrogram(
            y=y, sr=sr, n_mels=N_MELS, 
            hop_length=HOP_LENGTH, n_fft=N_FFT
        )
        
        # Log scale
        mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)
        
        # Resize naar target shape
        if mel_spec_db.shape[1] < target_shape[1]:
            pad_width = target_shape[1] - mel_spec_db.shape[1]
            mel_spec_db = np.pad(mel_spec_db, ((0, 0), (0, pad_width)), mode='constant')
        else:
            mel_spec_db = mel_spec_db[:, :target_shape[1]]
        
        # Normalize
        mel_spec_db = (mel_spec_db - mel_spec_db.min()) / (mel_spec_db.max() - mel_spec_db.min() + 1e-8)
        
        return mel_spec_db
    except Exception as e:
        return None


print("‚úÖ Xeno-canto API v3 functies geladen")

## 5Ô∏è‚É£ Training Functie

In [None]:
def prepare_data_for_species(scientific_name, dutch_name):
    """Download en prepareer data voor √©√©n soort."""
    print(f"  üì• Zoeken op Xeno-canto...")
    recordings = search_xenocanto(scientific_name)
    
    total_recs = sum(len(v) for v in recordings.values())
    if total_recs == 0:
        return None, None, "no_recordings"
    
    print(f"  üìä Gevonden: song={len(recordings['song'])}, call={len(recordings['call'])}, alarm={len(recordings['alarm'])}")
    
    X, y = [], []
    label_map = {'song': 0, 'call': 1, 'alarm': 2}
    
    # Download en verwerk per categorie
    for label_name, label_idx in label_map.items():
        recs = recordings[label_name][:50]  # Max 50 per categorie
        
        for rec in recs:
            audio_url = rec.get('file')
            if not audio_url:
                continue
            
            # Download
            audio_path = TEMP_DIR / f"temp_{rec['id']}.mp3"
            if not download_audio(audio_url, audio_path):
                continue
            
            # Naar spectrogram
            spec = audio_to_spectrogram(audio_path)
            
            # Cleanup
            if audio_path.exists():
                audio_path.unlink()
            
            if spec is not None:
                X.append(spec)
                y.append(label_idx)
    
    if len(X) < MIN_SAMPLES:
        return None, None, f"insufficient_data ({len(X)})"
    
    # Check minimaal 2 klassen
    unique_classes = len(set(y))
    if unique_classes < 2:
        return None, None, f"single_class"
    
    return np.array(X), np.array(y), "ok"


def train_model(X, y, num_classes):
    """Train het model."""
    # Split
    X_train, X_val, y_train, y_val = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # Naar tensors
    X_train = torch.FloatTensor(X_train).unsqueeze(1).to(DEVICE)
    X_val = torch.FloatTensor(X_val).unsqueeze(1).to(DEVICE)
    y_train = torch.LongTensor(y_train).to(DEVICE)
    y_val = torch.LongTensor(y_val).to(DEVICE)
    
    # DataLoaders
    train_loader = DataLoader(
        TensorDataset(X_train, y_train),
        batch_size=BATCH_SIZE, shuffle=True
    )
    val_loader = DataLoader(
        TensorDataset(X_val, y_val),
        batch_size=BATCH_SIZE
    )
    
    # Model
    model = VocalizationCNN(num_classes=num_classes).to(DEVICE)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
    
    # Training
    best_acc = 0
    best_model_state = None
    history = {'loss': [], 'val_loss': [], 'accuracy': [], 'val_accuracy': []}
    
    for epoch in range(EPOCHS):
        # Train
        model.train()
        train_loss, train_correct = 0, 0
        
        for X_batch, y_batch in train_loader:
            optimizer.zero_grad()
            outputs = model(X_batch)
            loss = criterion(outputs, y_batch)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            train_correct += (outputs.argmax(1) == y_batch).sum().item()
        
        # Validate
        model.eval()
        val_loss, val_correct = 0, 0
        
        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                outputs = model(X_batch)
                val_loss += criterion(outputs, y_batch).item()
                val_correct += (outputs.argmax(1) == y_batch).sum().item()
        
        # Metrics
        train_acc = train_correct / len(X_train)
        val_acc = val_correct / len(X_val)
        
        history['loss'].append(train_loss / len(train_loader))
        history['val_loss'].append(val_loss / len(val_loader))
        history['accuracy'].append(train_acc)
        history['val_accuracy'].append(val_acc)
        
        if val_acc > best_acc:
            best_acc = val_acc
            best_model_state = model.state_dict().copy()
    
    return best_model_state, best_acc, history


def train_species(scientific_name, dutch_name):
    """Complete training pipeline voor √©√©n soort."""
    dirname = dutch_name.lower().replace(' ', '_')
    model_filename = f"{dirname}_cnn_{VERSION}.pt"
    model_path = MODELS_DIR / model_filename
    
    # Skip als model al bestaat
    if model_path.exists():
        print(f"  ‚è≠Ô∏è Model bestaat al")
        return None, 'exists'
    
    start_time = time.time()
    
    # Data prepareren
    X, y, status = prepare_data_for_species(scientific_name, dutch_name)
    
    if X is None:
        return None, status
    
    print(f"  üéØ Training met {len(X)} samples...")
    
    # Train
    num_classes = len(np.unique(y))
    model_state, accuracy, history = train_model(X, y, num_classes)
    
    # Save
    torch.save({
        'model_state_dict': model_state,
        'num_classes': num_classes,
        'accuracy': accuracy,
        'history': history,
        'species_name': dutch_name,
        'scientific_name': scientific_name,
        'version': VERSION,
        'trained_at': datetime.now().isoformat(),
        'samples': len(X)
    }, model_path)
    
    elapsed = time.time() - start_time
    print(f"  ‚úÖ Klaar! Accuracy: {accuracy:.1%}, Tijd: {elapsed:.0f}s")
    
    return accuracy, 'success'


print("‚úÖ Training functies geladen")

## 6Ô∏è‚É£ Start Training

**Let op:** Dit duurt ~10-12 uur voor alle 232 soorten. Je kunt tussentijds stoppen en later verder gaan - bestaande modellen worden overgeslagen.

In [None]:
# Training starten
results = []
start_all = time.time()

print(f"{'='*60}")
print(f"EMSN Vocalization Training")
print(f"{'='*60}")
print(f"Start: {datetime.now()}")
print(f"Soorten: {len(DUTCH_SPECIES)}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")
print(f"{'='*60}\n")

for i, (scientific, dutch) in enumerate(DUTCH_SPECIES.items()):
    print(f"\n[{i+1}/{len(DUTCH_SPECIES)}] {dutch} ({scientific})")
    
    try:
        acc, status = train_species(scientific, dutch)
        results.append({
            'dutch_name': dutch,
            'scientific_name': scientific,
            'accuracy': acc,
            'status': status
        })
    except Exception as e:
        print(f"  ‚ùå Error: {e}")
        results.append({
            'dutch_name': dutch,
            'scientific_name': scientific,
            'accuracy': None,
            'status': f'error: {str(e)[:50]}'
        })
    
    # Tussendoor opslaan
    if (i + 1) % 10 == 0:
        import pandas as pd
        pd.DataFrame(results).to_csv(LOGS_DIR / f'progress_{VERSION}.csv', index=False)
        print(f"  üíæ Progress opgeslagen ({i+1} soorten)")

# Eindresultaat
elapsed_all = time.time() - start_all
print(f"\n{'='*60}")
print(f"KLAAR!")
print(f"{'='*60}")
print(f"Totale tijd: {elapsed_all/3600:.1f} uur")
print(f"Succesvol: {sum(1 for r in results if r['status'] == 'success')}")
print(f"Overgeslagen: {sum(1 for r in results if r['status'] == 'exists')}")
print(f"Mislukt: {sum(1 for r in results if r['status'] not in ['success', 'exists'])}")

In [None]:
# Resultaten opslaan en analyseren
import pandas as pd

df = pd.DataFrame(results)
df.to_csv(LOGS_DIR / f'training_results_{VERSION}.csv', index=False)

# Statistieken
successful = df[df['status'] == 'success']

print(f"\nüìä SAMENVATTING")
print(f"{'='*40}")
print(f"Totaal soorten: {len(df)}")
print(f"Succesvol getraind: {len(successful)}")
print(f"Al bestaand: {len(df[df['status'] == 'exists'])}")
print(f"Mislukt: {len(df[~df['status'].isin(['success', 'exists'])])}")

if len(successful) > 0:
    print(f"\nüìà ACCURACY STATISTIEKEN")
    print(f"{'='*40}")
    print(f"Gemiddeld: {successful['accuracy'].mean():.1%}")
    print(f"Mediaan: {successful['accuracy'].median():.1%}")
    print(f"Min: {successful['accuracy'].min():.1%}")
    print(f"Max: {successful['accuracy'].max():.1%}")
    
    print(f"\nüèÜ TOP 10 BESTE MODELLEN")
    print(successful.nlargest(10, 'accuracy')[['dutch_name', 'accuracy']].to_string(index=False))
    
    print(f"\n‚ö†Ô∏è BOTTOM 5 (laagste accuracy)")
    print(successful.nsmallest(5, 'accuracy')[['dutch_name', 'accuracy']].to_string(index=False))

## 7Ô∏è‚É£ Modellen Downloaden

De modellen staan nu in Google Drive. Je kunt ze downloaden via:
1. Google Drive web interface
2. Of maak een ZIP:

In [None]:
# Maak ZIP van alle modellen
import shutil

zip_path = DRIVE_BASE / f'models_{VERSION}'
shutil.make_archive(str(zip_path), 'zip', MODELS_DIR)

print(f"‚úÖ ZIP aangemaakt: {zip_path}.zip")
print(f"   Grootte: {(zip_path.with_suffix('.zip')).stat().st_size / 1e6:.1f} MB")
print(f"\nDownload via Google Drive of:")
print(f"   from google.colab import files")
print(f"   files.download('{zip_path}.zip')")

---

## Na de training

1. **Download** `models_2025.zip` van Google Drive
2. **Unzip** naar je NAS: `/volume1/docker/emsn-vocalization/data/models/`
3. De modellen worden automatisch herkend door het EMSN systeem

### Model formaat
Elk `.pt` bestand bevat:
- `model_state_dict`: PyTorch model weights
- `accuracy`: Validatie accuracy
- `species_name`: Nederlandse naam
- `version`: Model versie (2025)

---
*EMSN 2.0 - Ecologisch Monitoring Systeem Nijverdal*