# **BirdCLEF 2025 Training Notebook**

This is a baseline training pipeline for BirdCLEF 2025 using EfficientNetB0 with PyTorch and Timm(for pretrained EffNet). You can check inference and preprocessing notebooks in the following links: 

- [EfficientNet B0 Pytorch [Inference] | BirdCLEF'25](https://www.kaggle.com/code/kadircandrisolu/efficientnet-b0-pytorch-inference-birdclef-25)

  
- [Transforming Audio-to-Mel Spec. | BirdCLEF'25](https://www.kaggle.com/code/kadircandrisolu/transforming-audio-to-mel-spec-birdclef-25)  

Note that by default this notebook is in Debug Mode, so it will only train the model with 2 epochs, but the [weight](https://www.kaggle.com/datasets/kadircandrisolu/birdclef25-effnetb0-starter-weight) I used in the inference notebook was obtained after 10 epochs of training.

**Features**
* Implement with Pytorch and Timm
* Flexible audio processing with both pre-computed and on-the-fly mel spectrograms
* Stratified 5-fold cross-validation with ensemble capability
* Mixup training for improved generalization
* Spectrogram augmentations (time/frequency masking, brightness adjustment)
* AdamW optimizer with Cosine Annealing LR scheduling
* Debug mode for quick experimentation with smaller datasets

**Pre-computed Spectrograms**
For faster training, you can use pre-computed mel spectrograms from [this dataset](https://www.kaggle.com/datasets/kadircandrisolu/birdclef25-mel-spectrograms) by setting `LOAD_DATA = True`

## Libraries

In [1]:
# Basic imports
import numpy as np, pandas as pd, os, random, warnings, json, datetime, time
from tqdm.auto import tqdm


# Specific imports
import logging
from metric_logger import MetricLogger

# PyTorch imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader
from torch.amp import autocast, GradScaler

# Other ML imports
from sklearn.model_selection import StratifiedKFold
import timm

# Custom imports
from processing import process_audio_file
from utilities import set_seed, collate_fn
from training_utilities import get_scheduler, get_criterion, clean_gpu_memory, compile_model, calculate_soft_label_metrics

# Suppress warnings and set logging level
warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.ERROR)

## Configuration

In [2]:
class CFG:
    
    seed = 42
    debug = False 
    LOAD_DATA = True
    
    # Paths and directories
    OUTPUT_DIR = 'output/'
    plots_dir = 'output/plots/'
    metrics_dir = 'output/metrics/'
    configs_dir = 'output/configs/'
    models_dir = 'output/models/'
    train_datadir = 'birdclef-2025/train_audio'
    train_csv = 'birdclef-2025/train.csv'
    train_soundscapes = 'birdclef-2025/train_soundscapes'
    test_soundscapes = 'birdclef-2025/test_soundscapes'
    submission_csv = 'birdclef-2025/sample_submission.csv'
    taxonomy_csv = 'birdclef-2025/taxonomy.csv'
    spectrogram_npy = 'archive/train_melspec_5_256_256.npy'
    train_soundscapes_spectrograms = 'archive/train_soundscapes_melspec_12x5_256_256/'
    
    # Label processing
    secondary_weight = 0.7 
    use_external_pseudolabels = True
    use_soft_labels = False
    normalize_labels = False
    external_pseudolabels = 'pseudolabels_even_better.csv'
    pseudolabel_confidence_threshold = 0.04  # Only use predictions above this threshold, only if using hard labels
    max_pseudolabels = 25000  # Maximum number of pseudolabeled samples to use
    stratified_pseudolabels = True  # Use stratified sampling for pseudolabels (currently only for hard labels)
 
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

    # Training settings
    epochs = 20
    n_fold = 5
    use_early_stopping = True
    early_stopping_epochs = 3 

    # Mel spectrogram parameters
    FS = 32000
    TARGET_DURATION = 5.0
    TARGET_SHAPE = (256, 256)
    N_FFT = 1024
    HOP_LENGTH = 512
    N_MELS = 128
    FMIN = 50
    FMAX = 14000
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(f"Using device: {device}")
    
    # Loss parameters
    criterion = 'CombinedLoss'  # Options: 'BCEWithLogitsLoss', 'FocalLoss', 'CombinedLoss'
    focal_alpha = 1.0
    focal_gamma = 3.0
    bce_weight = 0.5
    focal_weight = 0.5

    # optimizer parameters
    lr = 0.5*1e-3 
    weight_decay = 1e-5

    #scheduler parameters
    scheduler = 'CosineAnnealingLR'
    min_lr = 1e-6
    use_lr_warmup = True
    warmup_epochs = 3

    # augmentation options
    aug_prob = 0.5
    spec_augment = True
    spec_augment_params = {
        'time_mask_param': 30,
        'freq_mask_param': 20,
        'num_masks': 2,
    }
    mixup_alpha = 0.5
    cutmix_alpha = 1.0
    use_cutmix = True
    
    # Model architecture options
    model_name = 'efficientnet_b1'  # Options: 'efficientnetv2_s', 'convnext_tiny', 'efficientnet_b0' 
    pretrained = True
    in_channels = 1
    dropout_rate = 0.2
    drop_path_rate = 0.2

    # Memory and speed optimizations
    gradient_accumulation_steps = 4  # Increase effective batch size without more memory
    use_amp = True                   # Use automatic mixed precision
    pin_memory = True                # Faster data transfer to GPU
    persistent_workers = True        # Keep workers alive between epochs
    num_workers = 0                  # Match to number of CPU cores
    prefetch_factor = 4              # Number of batches to prefetch (default is 2)
    batch_size = 32                  # Effective batch size will be batch_size * gradient_accumulation_steps
    
    # Compiler settings
    compile_backend = "inductor"        # Options: "eager", "torchscript", "onnx", "inductor"
    compile_mode = "default"         # Options: "default", "reduce-overhead", "max-autotune"

    def update_debug_settings(self):
        if self.debug:
            self.n_fold = 2
            self.epochs = 2

    def save_config(self):
        """Save configuration to file unless in debug mode"""
        if not self.debug:    
            config_dict = {attr: getattr(self, attr) for attr in dir(self) if not attr.startswith('__') and not callable(getattr(self, attr))}
            filename = f"config_{self.timestamp}_{self.model_name}.json"
            filepath = os.path.join(self.configs_dir, filename)
            
            with open(filepath, 'w') as f:
                json.dump(config_dict, f, indent=4, default=str)
            print(f"Config saved to {filepath}")

cfg = CFG()
set_seed(cfg.seed)
cfg.update_debug_settings()

Using device: cuda


## Pre-processing
These functions handle the transformation of audio files to mel spectrograms for model input, with flexibility controlled by the `LOAD_DATA` parameter. The process involves either loading pre-computed spectrograms from this [dataset](https://www.kaggle.com/datasets/kadircandrisolu/birdclef25-mel-spectrograms) (when `LOAD_DATA=True`) or dynamically generating them (when `LOAD_DATA=False`), transforming audio data into spectrogram representations, and preparing it for the neural network.

## Dataset Preparation and Data Augmentations
We'll convert audio to mel spectrograms and apply random augmentations with 50% probability each - including time stretching, pitch shifting, and volume adjustments. This randomized approach creates diverse training samples from the same audio files

In [3]:
class BirdCLEFDatasetFromNPY(Dataset):
    
    def __init__(self, df, cfg, spectrograms=None, mode="train"):
        self.df = df
        self.cfg = cfg
        self.mode = mode
        self.spectrograms = spectrograms
        
        taxonomy_df = pd.read_csv(self.cfg.taxonomy_csv)
        self.species_ids = taxonomy_df['primary_label'].tolist()
        self.num_classes = len(self.species_ids)
        self.label_to_idx = {label: idx for idx, label in enumerate(self.species_ids)}
        
        if cfg.debug:
            self.df = self.df.sample(min(1000, len(self.df)), random_state=cfg.seed).reset_index(drop=True)
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        
        row = self.df.iloc[idx]
        samplename = row['samplename']

        if self.spectrograms and samplename in self.spectrograms:
            spec = self.spectrograms[samplename]
        elif not self.cfg.LOAD_DATA:
            spec = process_audio_file(row['filepath'], self.cfg)
        else: 
            spec = np.zeros(self.cfg.TARGET_SHAPE, dtype=np.float32)
            print(f"Warning: Spectrogram for {samplename} not found and could not be generated")    

        spec = torch.from_numpy(spec).float().unsqueeze(0)  # Add channel dimension

        if self.mode == "train" and random.random() < self.cfg.aug_prob:
            spec = self.apply_spec_augmentations(spec)
        
        if row["primary_label"] == 'soft': # soft label
            target = row['secondary_labels'] # already normalized if normalize_labels is True
        else:
            target = self.encode_label(row['primary_label'], 1.0)
            if 'secondary_labels' in row and row['secondary_labels'] not in [[''], None, np.nan, []]:
                if isinstance(row['secondary_labels'], str):
                    secondary_labels = eval(row['secondary_labels'])
                    num_secondary_labels = 1
                else:
                    secondary_labels = row['secondary_labels']
                    num_secondary_labels = len(secondary_labels)
                
                for label in secondary_labels:
                    idx = self.label_to_idx.get(label)
                    if idx is not None:
                        target[idx] = cfg.secondary_weight/num_secondary_labels
                    else:
                        num_secondary_labels -= 1  # Ignore labels not in taxonomy
                if cfg.normalize_labels:
                    target /= np.sum(target)

            
        item = {
            'melspec': spec, 
            'target': torch.from_numpy(target).float(),
        }
                
        return item
    
    def apply_spec_augmentations(self, spec):
        """Apply augmentations to spectrogram"""
        
        # Time/frequency masking
        if random.random() < 0.5:
            for _ in range(random.randint(1, 3)):
                width = random.randint(5, 20)
                start = random.randint(0, spec.shape[2] - width)
                spec[0, :, start:start+width] = 0
        
        if random.random() < 0.5:
            for _ in range(random.randint(1, 3)):
                height = random.randint(5, 20)
                start = random.randint(0, spec.shape[1] - height)
                spec[0, start:start+height, :] = 0
        
        # Random brightness/contrast adjustment
        if random.random() < 0.5:
            gain = random.uniform(0.8, 1.2)
            bias = random.uniform(-0.1, 0.1)
            spec = spec * gain + bias
            spec = torch.clamp(spec, 0, 1)
        
        # Gaussian noise for robustness
        if random.random() < 0.3:
            noise = torch.randn_like(spec) * random.uniform(0.001, 0.005)
            spec = spec + noise
            spec = torch.clamp(spec, 0, 1)
            
        # Random time/frequency shifts
        if random.random() < 0.3:
            shift_x = random.randint(-4, 4)
            shift_y = random.randint(-4, 4)
            spec = torch.roll(spec, shifts=(shift_y, shift_x), dims=(1, 2))
        
        return spec
    
    def encode_label(self, label, weight):
        """Encode label to one-hot vector"""
        target = np.zeros(self.num_classes)
        idx = self.label_to_idx.get(label)
        if idx is not None:
            target[idx] = weight
        return target

## Model Definition

In [4]:
class BirdCLEFModel(nn.Module):
    def __init__(self, cfg):
        super().__init__()
        self.cfg = cfg
        taxonomy_df = pd.read_csv(cfg.taxonomy_csv)
        cfg.num_classes = len(taxonomy_df)
        
        # Support for different model architectures
        self.backbone = timm.create_model(
            cfg.model_name,
            pretrained=cfg.pretrained,
            in_chans=cfg.in_channels,
            drop_rate=cfg.dropout_rate,
            drop_path_rate=cfg.drop_path_rate if hasattr(cfg, 'drop_path_rate') else 0.2
        )
        
        # Extract feature dimension based on model type
        if 'efficientnet' in cfg.model_name:
            backbone_out = self.backbone.classifier.in_features
            self.backbone.classifier = nn.Identity()
        elif 'convnext' in cfg.model_name:
            backbone_out = self.backbone.head.fc.in_features
            self.backbone.head.fc = nn.Identity()
        elif 'resnet' in cfg.model_name:
            backbone_out = self.backbone.fc.in_features
            self.backbone.fc = nn.Identity()
        else:
            backbone_out = self.backbone.get_classifier().in_features
            self.backbone.reset_classifier(0, '')
        
        self.pooling = nn.AdaptiveAvgPool2d(1)
        self.feat_dim = backbone_out
        
        # Add an additional projection layer for better feature representation
        if hasattr(cfg, 'projection_dim') and cfg.projection_dim > 0:
            self.projection = nn.Sequential(
                nn.Linear(backbone_out, cfg.projection_dim),
                nn.BatchNorm1d(cfg.projection_dim),
                nn.ReLU(inplace=True),
                nn.Dropout(0.3),
                nn.Linear(cfg.projection_dim, cfg.num_classes)
            )
            self.classifier = self.projection
        else:
            self.classifier = nn.Linear(backbone_out, cfg.num_classes)
        
        # Mixup and CutMix support
        self.mixup_enabled = hasattr(cfg, 'mixup_alpha') and cfg.mixup_alpha > 0
        self.cutmix_enabled = hasattr(cfg, 'use_cutmix') and cfg.use_cutmix and hasattr(cfg, 'cutmix_alpha') and cfg.cutmix_alpha > 0
        
        if self.mixup_enabled:
            self.mixup_alpha = cfg.mixup_alpha
        if self.cutmix_enabled:
            self.cutmix_alpha = cfg.cutmix_alpha
    
    def forward(self, x, targets=None):
    
        if self.training and self.mixup_enabled and targets is not None:
            mixed_x, targets_a, targets_b, lam = self.mixup_data(x, targets)
            x = mixed_x
        else:
            targets_a, targets_b, lam = None, None, None
        
        features = self.backbone(x)
        
        if isinstance(features, dict):
            features = features['features']
            
        if len(features.shape) == 4:
            features = self.pooling(features)
            features = features.view(features.size(0), -1)
        
        logits = self.classifier(features)
        
        if self.training and self.mixup_enabled and targets is not None:
            loss = self.mixup_criterion(F.binary_cross_entropy_with_logits, logits, targets_a, targets_b, lam)
            return logits, loss
            
        return logits
    
    def mixup_data(self, x, targets):
        """Applies mixup to the data batch"""
        batch_size = x.size(0)
        lam = np.random.beta(self.mixup_alpha, self.mixup_alpha)
        indices = torch.randperm(batch_size).to(x.device, non_blocking=True)
        mixed_x = lam * x + (1 - lam) * x[indices]
        
        return mixed_x, targets, targets[indices], lam
    
    def mixup_criterion(self, criterion, pred, y_a, y_b, lam):
        """Applies mixup to the loss function"""
        return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)
    
    def cutmix_data(self, x, targets):
        batch_size = x.size(0)
        lam = np.random.beta(self.cutmix_alpha, self.cutmix_alpha)
        
        # Get random indices for mixing
        indices = torch.randperm(batch_size).to(x.device)
        
        # Get random box coordinates
        W, H = x.size(2), x.size(3)
        cut_ratio = np.sqrt(1. - lam)
        cut_w = np.int_(W * cut_ratio)
        cut_h = np.int_(H * cut_ratio)
        
        cx = np.random.randint(W)
        cy = np.random.randint(H)
        
        bbx1 = np.clip(cx - cut_w // 2, 0, W)
        bby1 = np.clip(cy - cut_h // 2, 0, H)
        bbx2 = np.clip(cx + cut_w // 2, 0, W)
        bby2 = np.clip(cy + cut_h // 2, 0, H)
        
        # Apply cutmix
        x_mixed = x.clone()
        x_mixed[:, :, bbx1:bbx2, bby1:bby2] = x[indices, :, bbx1:bbx2, bby1:bby2]
        
        # Adjust lambda to actual area ratio
        lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (W * H))
        
        return x_mixed, targets, targets[indices], lam

## Training Utilities
We are configuring our optimization strategy with the AdamW optimizer, cosine scheduling, and the BCEWithLogitsLoss criterion.

## Training Loop

In [5]:
def train_one_epoch(model, loader, optimizer, criterion, device, scheduler=None, use_amp=True, grad_accum_steps=1):
    model.train()
    scaler = GradScaler(enabled=use_amp)
    
    # Use lists to accumulate batches, but don't keep all outputs in memory
    batch_count = 0
    num_batches = len(loader)
    running_loss = 0
    outputs_for_metrics = []
    targets_for_metrics = []
    metric_collection_interval = min(100, len(loader) // 10 or 1)

    optimizer.zero_grad(set_to_none=True)
    pbar = tqdm(enumerate(loader), total=len(loader), desc="Training")
    
    # Track additional metrics
    total_samples = 0
    epoch_start_time = time.time()
    
    for step, batch in pbar:
        # Move to device with non_blocking for potential speedup
        inputs = batch['melspec'].to(device, non_blocking=True)
        targets = batch['target'].to(device, non_blocking=True)
        
        with autocast(enabled=use_amp, device_type=device):
            # Handle model outputs with mixup/cutmix
            if (model.mixup_enabled or model.cutmix_enabled) and model.training:
                if model.mixup_enabled and model.cutmix_enabled:
                    # Randomly choose between mixup and cutmix
                    if random.random() < 0.5:
                        mixed_x, targets_a, targets_b, lam = model.mixup_data(inputs, targets)
                    else:
                        mixed_x, targets_a, targets_b, lam = model.cutmix_data(inputs, targets)
                elif model.mixup_enabled:
                    mixed_x, targets_a, targets_b, lam = model.mixup_data(inputs, targets)
                else:
                    mixed_x, targets_a, targets_b, lam = model.cutmix_data(inputs, targets)
                    
                outputs = model(mixed_x)
                loss = lam * criterion(outputs, targets_a) + (1 - lam) * criterion(outputs, targets_b)
            else:
                outputs = model(inputs)
                loss = criterion(outputs, targets)
        
        # Normalize loss for gradient accumulation
        loss = loss / grad_accum_steps
        scaler.scale(loss).backward()
        
        batch_count += 1
        running_loss += loss.item() * grad_accum_steps
        total_samples += inputs.size(0)
        
        # Only collect some batches for metrics to save memory
        if step % metric_collection_interval == 0:
            outputs_for_metrics.append(outputs.detach().cpu())
            targets_for_metrics.append(targets.detach().cpu())
        
        # Step optimizer after accumulating gradients
        if batch_count % grad_accum_steps == 0 or step == len(loader) - 1:
            # Unscale before possible gradient clipping
            scaler.unscale_(optimizer)
            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad(set_to_none=True)
            
            if scheduler and isinstance(scheduler, lr_scheduler.OneCycleLR):
                scheduler.step()

        # Update progress bar with running loss
        pbar.set_postfix({
            'it/s/bs' : pbar.n / pbar.format_dict['elapsed'] / cfg.batch_size,
            'train_loss': running_loss / (step + 1),
            'lr': optimizer.param_groups[0]['lr'],
        })
        
        # Free memory explicitly
        del inputs, outputs
        if step % 10 == 0:  # Periodically clear CUDA cache
            torch.cuda.empty_cache()

    # Calculate comprehensive metrics for soft labels
    epoch_time = time.time() - epoch_start_time
    
    all_outputs = torch.cat(outputs_for_metrics)
    all_targets = torch.cat(targets_for_metrics)
    soft_metrics = calculate_soft_label_metrics(all_targets.numpy(), torch.sigmoid(all_outputs).numpy())
    auc = soft_metrics['macro_auc']
        
    avg_loss = running_loss / len(loader)
    
    # Return comprehensive metrics
    metrics = {
        'train_loss': avg_loss,
        'train_auc': auc,
        'learning_rate': optimizer.param_groups[0]['lr'],
        'epoch_time_minutes': epoch_time / 60,
        'samples_per_second': total_samples / epoch_time,
        'total_samples': total_samples
    }
    
    del outputs_for_metrics, targets_for_metrics
    torch.cuda.empty_cache()
    
    return metrics

# Fixed validate function to properly accumulate predictions
def validate(model, loader, criterion, device, use_amp=True):
    model.eval()
    total_loss = 0.0
    
    # Store all predictions and targets for final metrics
    all_probs_accumulated = []
    all_targets_accumulated = []
    total_samples = 0
    val_start_time = time.time()
    
    with torch.no_grad():
        for batch_idx, batch in enumerate(tqdm(loader, desc="Validation")):
            inputs = batch['melspec'].to(device, non_blocking=True)
            targets = batch['target'].to(device, non_blocking=True)

            with autocast(enabled=use_amp, device_type=device):
                outputs = model(inputs)
                loss = criterion(outputs, targets)

            total_loss += loss.item()
            total_samples += inputs.size(0)
            
            probs = torch.sigmoid(outputs).cpu().numpy()
            targets_np = targets.cpu().numpy()
            
            # Accumulate all predictions and targets
            all_probs_accumulated.append(probs)
            all_targets_accumulated.append(targets_np)
            
            # Clear memory periodically but keep accumulating
            del inputs, outputs, targets, probs, targets_np
            if batch_idx % 10 == 0:
                torch.cuda.empty_cache()
    
    # Combine all accumulated predictions and targets
    probs_array = np.vstack(all_probs_accumulated)
    targets_array = np.vstack(all_targets_accumulated)
    
    val_time = time.time() - val_start_time
    
    # Use macro-averaged AUC as primary metric
    soft_metrics = calculate_soft_label_metrics(targets_array, probs_array)
    auc = soft_metrics['macro_auc']
    valid_classes = soft_metrics['valid_classes']
    
    avg_loss = total_loss / len(loader)
    
    # Return comprehensive metrics focused on macro-averaged AUC
    metrics = {
        'val_loss': avg_loss,
        'val_auc': auc,  # This is macro-averaged AUC
        'val_valid_classes': valid_classes,
        'val_time_minutes': val_time / 60,
        'val_samples_per_second': total_samples / val_time if val_time > 0 else 0,
        'val_total_samples': total_samples
    }
    
    # Return ROC data
    roc_data = (targets_array, probs_array)
    
    # Clean up memory
    del all_probs_accumulated, all_targets_accumulated
    torch.cuda.empty_cache()
    
    return metrics, roc_data

In [6]:
def load_soft_pseudolabels(df, cfg, seed=None):
    """
    Load soft pseudolabels.
    """
    # Can use different seeds e.g. for different folds
    random_seed = seed if seed is not None else cfg.seed
    np.random.seed(random_seed)

    print(f"Found {len(df)} samples in the pseudolabels")
    if cfg.debug:
        df = df.sample(min(1000, len(df)), random_state=random_seed).reset_index(drop=True)
    
    label_col = df.columns[1:]
    
    # Normalize probabilities to sum to 1 for each row
    if cfg.normalize_labels:
        df[label_col] = df[label_col].div(df[label_col].sum(axis=1), axis=0)
        
    # Sample to restrict to max_pseudolabels
    df = df.sample(min(cfg.max_pseudolabels, len(df)), random_state=random_seed).reset_index(drop=True)
    row_ids = df["row_id"]

    # Convert to training-compatible DataFrame format
    label_df = pd.DataFrame({
        'samplename': row_ids,
        'filename': [f"{'_'.join(x.split('_')[:3])}.ogg" for x in row_ids],
        'timestamp': [int(x.split('_')[3]) for x in row_ids],
        'primary_label' : ["soft"] * len(row_ids),
        'secondary_labels': [df.loc[df["row_id"]==x].to_numpy()[0, 1:].astype(np.float32) for x in row_ids]
    })
    label_df['filepath'] = label_df['filename'].apply(lambda x: os.path.join(cfg.train_soundscapes, x))
    print(f"Loaded {len(label_df)} pseudolabels")

    return label_df

In [7]:
def load_pseudolabels(df, cfg, seed=None):
    """
    Load pseudolabels and sample to ensure balanced class representation.
    Returns a maximum of cfg.max_pseudolabels samples with balanced label distribution.
    """
    random_seed = seed if seed is not None else cfg.seed
    np.random.seed(random_seed)
    
    if cfg.debug:
        df = df.sample(min(1000, len(df)), random_state=random_seed).reset_index(drop=True)
    
    label_cols = df.columns[1:]
    
    # Step 1: Find samples where at least one prediction passes the confidence threshold
    # Calculate row sums first to filter out rows with no valid predictions
    df_filtered = df.copy()
    row_sums = df_filtered[label_cols].sum(axis=1)
    df_filtered = df_filtered[row_sums > 0].reset_index(drop=True)
    
    # Normalize probabilities to sum to 1 for each row
    if cfg.normalize_labels:
        df_filtered[label_cols] = df_filtered[label_cols].div(df_filtered[label_cols].sum(axis=1), axis=0)
    
    # Create mask for values that pass threshold
    mask = df_filtered[label_cols] >= cfg.pseudolabel_confidence_threshold
    
    # Get indices and values of samples with predictions above threshold using the mask
    valid_samples = {}  # {row_id: [labels sorted by confidence]}
    
    for idx, row in df_filtered.iterrows():
        row_id = row['row_id']
        # Use the mask to get valid labels for this row
        valid_mask = mask.iloc[idx]
        valid_label_cols = label_cols[valid_mask]
        
        if len(valid_label_cols) > 0:
            # Get confidences for valid labels and sort them
            confidences = row[valid_label_cols]
            # Create (confidence, label) pairs and sort by confidence
            label_conf_pairs = [(label, conf) for label, conf in zip(valid_label_cols, confidences)]
            label_conf_pairs.sort(key=lambda x: x[1], reverse=True)
            # Extract just the sorted labels
            valid_samples[row_id] = [label for label, _ in label_conf_pairs]
    
    print(f"Found {len(valid_samples)} samples with confidence > {cfg.pseudolabel_confidence_threshold}")
    
    # Step 2: Apply stratified sampling if enabled
    if cfg.stratified_pseudolabels and len(valid_samples) > cfg.max_pseudolabels:
        print("Stratified sampling of pseudolabels")
        
        # Create label-to-samples mapping
        label_to_samples = {}
        for sample_id, labels in valid_samples.items():
            for label in labels:
                if label not in label_to_samples:
                    label_to_samples[label] = []
                label_to_samples[label].append(sample_id)
        
        # Calculate target samples per label for balanced distribution
        num_unique_labels = len(label_to_samples)
        target_per_label = max(1, int(cfg.max_pseudolabels / num_unique_labels))
        
        print(f"Found {num_unique_labels} unique bird species in pseudolabels")
        print(f"Target ~{target_per_label} samples per species to stay under {cfg.max_pseudolabels} total")
        
        # Shuffle labels for randomness across folds
        shuffled_labels = list(label_to_samples.keys())
        random.seed(random_seed)
        random.shuffle(shuffled_labels)
        
        # Sample entries for each label
        selected_samples = set()
        
        for label in shuffled_labels:
            samples = label_to_samples[label]
            # Shuffle samples with deterministic but different seed per label
            samples_seed = random_seed + hash(label) % 10000
            random.seed(samples_seed)
            random.shuffle(samples)
            
            # Take at most target_per_label samples
            selected = samples[:target_per_label]
            selected_samples.update(selected)
            
            # Stop if we've reached the max pseudolabels
            if len(selected_samples) >= cfg.max_pseudolabels:
                break
        
        # Filter valid_samples to only include selected samples
        valid_samples = {k: v for k, v in valid_samples.items() if k in selected_samples}
        print(f"Sampled {len(valid_samples)} pseudolabeled samples")
    else:
        print(f"Using randomly selected {cfg.max_pseudolabels} pseudolabels that pass confidence threshold")
    
    # Step 3: Convert to training-compatible DataFrame format
    # First, create a dataframe just with the row_ids we need
    rows_to_keep = random.sample(list(valid_samples.keys()), min(cfg.max_pseudolabels, len(valid_samples)))

    label_df = pd.DataFrame({
        'samplename': rows_to_keep,
        'filename': [f"{'_'.join(x.split('_')[:3])}.ogg" for x in rows_to_keep],
        'timestamp': [int(x.split('_')[3]) for x in rows_to_keep],
        'primary_label': [valid_samples[x][0] for x in rows_to_keep],
        'secondary_labels': [valid_samples[x][1:] for x in rows_to_keep]
    })
    
    # Add filepath column 
    label_df['filepath'] = label_df['filename'].apply(lambda x: os.path.join(cfg.train_soundscapes, x))
    
    return label_df

In [8]:
def run_training(df, cfg, soundscape_df=None):
    taxonomy_df = pd.read_csv(cfg.taxonomy_csv)
    species_ids = taxonomy_df['primary_label'].tolist()
    cfg.num_classes = len(species_ids)

    if cfg.LOAD_DATA:
        spectrograms = np.load(cfg.spectrogram_npy, allow_pickle=True).item()
        print(f"Loaded {len(spectrograms)} pre-computed mel spectrograms for labeled data")
    else:   
        spectrograms = None
        print("Will generate spectrograms on-the-fly during training.")

    df['filepath'] = cfg.train_datadir + '/' + df.filename
    df['samplename'] = df.filename.map(lambda x: x.split('/')[0] + '-' + x.split('/')[-1].split('.')[0])

    if cfg.use_external_pseudolabels and soundscape_df is not None:
        print(f"Loading external pseudolabels (soft == {cfg.use_soft_labels})")
        if cfg.use_soft_labels:
            pseudolabel_df = load_soft_pseudolabels(soundscape_df, cfg)
        else:
            pseudolabel_df = load_pseudolabels(soundscape_df, cfg)
        df = pd.concat([df, pseudolabel_df], ignore_index=True)
        df = df.sample(frac=1, random_state=cfg.seed).reset_index(drop=True)

        for _, row in tqdm(pseudolabel_df.iterrows(), desc="Loading soundscape spectrograms", total=len(pseudolabel_df)):
            spectrograms[row['samplename']] = np.load(f"{cfg.train_soundscapes_spectrograms}{row['samplename']}.npy", allow_pickle=True)

    if cfg.n_fold > 1:
        skf = StratifiedKFold(n_splits=cfg.n_fold, shuffle=True, random_state=cfg.seed)
        folds = skf.split(df, df['primary_label'])
    else:
        folds = [(np.arange(len(df)), np.arange(len(df)))]

    best_scores = []
    all_fold_metrics = []
    
    # Create one shared metric logger for all folds
    metric_logger = MetricLogger(cfg)
            
    for fold, (train_idx, val_idx) in enumerate(folds):
        print(f'\n{"="*30} Fold {fold} {"="*30}')
        
        train_df = df.iloc[train_idx].reset_index(drop=True)
        val_df = df.iloc[val_idx].reset_index(drop=True)
        
        print(f'Training set: {len(train_df)} samples')
        print(f'Validation set: {len(val_df)} samples')

        # Prepare datasets
        train_dataset = BirdCLEFDatasetFromNPY(train_df, cfg, spectrograms=spectrograms, mode='train')
        val_dataset = BirdCLEFDatasetFromNPY(val_df, cfg, spectrograms=spectrograms, mode='valid')
        
        # Prepare data loaders
        train_loader = DataLoader(
            train_dataset, 
            batch_size=cfg.batch_size, 
            shuffle=True, 
            num_workers=cfg.num_workers,
            pin_memory=cfg.pin_memory,
            persistent_workers=cfg.persistent_workers if cfg.num_workers > 0 else False,
            prefetch_factor=cfg.prefetch_factor if cfg.num_workers > 0 else None,
            collate_fn=collate_fn,
            drop_last=True
        )

        val_loader = DataLoader(
            val_dataset, 
            batch_size=cfg.batch_size * 2,
            shuffle=False, 
            num_workers=cfg.num_workers,
            pin_memory=cfg.pin_memory,
            persistent_workers=cfg.persistent_workers if cfg.num_workers > 0 else False,
            prefetch_factor=cfg.prefetch_factor if cfg.num_workers > 0 else None,
            collate_fn=collate_fn
        )

        print(f"\n{'-'*20} Training Model {'-'*20}")
        model = BirdCLEFModel(cfg).to(cfg.device, non_blocking=True)
        model = compile_model(model, cfg)
        optimizer = optim.AdamW(
            model.parameters(),
            lr=cfg.lr,
            weight_decay=cfg.weight_decay
        )
        criterion = get_criterion(cfg)

        if cfg.scheduler == 'CosineAnnealingLR':
            cfg.T_max = cfg.epochs
        scheduler = get_scheduler(optimizer, cfg, len(train_loader))
        
        best_auc, best_epoch = 0, 0
        fold_start_time = time.time()
        
        for epoch in range(cfg.epochs):
            print(f"\n{'='*50}")
            print(f"Epoch {epoch+1}/{cfg.epochs} | Fold {fold}")
            print(f"{'='*50}")
            
            # Get comprehensive training metrics
            train_metrics = train_one_epoch(
                model, train_loader, optimizer, criterion, cfg.device,
                scheduler if isinstance(scheduler, lr_scheduler.OneCycleLR) else None,
                use_amp=cfg.use_amp,
                grad_accum_steps=cfg.gradient_accumulation_steps
            )
            
            # Get comprehensive validation metrics
            val_metrics, roc_data = validate(model, val_loader, criterion, cfg.device, 
                                use_amp=cfg.use_amp)
            
            # Log metrics to our metrics dataframe
            metric_logger.log_metrics(epoch, fold, train_metrics, val_metrics, roc_data)
            
            if scheduler is not None and not isinstance(scheduler, lr_scheduler.OneCycleLR):
                if isinstance(scheduler, lr_scheduler.ReduceLROnPlateau):
                    scheduler.step(val_metrics['val_loss'])
                else:
                    scheduler.step()
            
            # Print epoch summary
            print(f"\n--- Epoch {epoch+1} Summary ---")
            print(f"Train Loss: {train_metrics['train_loss']:.4f}, Train AUC: {train_metrics['train_auc']:.4f}")
            print(f"Val Loss: {val_metrics['val_loss']:.4f}, Val AUC: {val_metrics['val_auc']:.4f}")
            print(f"LR: {train_metrics['learning_rate']:.6f}, Epoch Time: {train_metrics['epoch_time_minutes']:.2f}m")
            print(f"Valid Classes: {val_metrics['val_valid_classes']}")
            
            if val_metrics['val_auc'] > best_auc:
                best_auc = val_metrics['val_auc']
                best_epoch = epoch + 1
                print(f"New best AUC: {best_auc:.4f} at epoch {best_epoch}")
                
                # Only save model if not in debug mode
                if not cfg.debug:
                    model_path = f"{cfg.models_dir}/model_{cfg.timestamp}_{cfg.model_name}_fold{fold}.pth"
                    torch.save({
                        'model_state_dict': model.state_dict(),
                        'optimizer_state_dict': optimizer.state_dict(),
                        'scheduler_state_dict': scheduler.state_dict() if scheduler else None,
                        'epoch': epoch,
                        'best_auc': best_auc,
                        'cfg': cfg
                    }, model_path)
                if cfg.use_early_stopping and epoch - best_epoch >= (cfg.early_stopping_epochs - 1):
                    print(f"Early stopping at epoch {epoch+1}")
                    break
        
        # Calculate fold completion time
        fold_time = time.time() - fold_start_time
        fold_metrics = {
            'fold_time_minutes': fold_time / 60,
            'best_epoch': best_epoch,
            'best_val_auc': best_auc
        }
        all_fold_metrics.append(fold_metrics)
        
        best_scores.append(best_auc)
        print(f"\n*** FOLD {fold} COMPLETE ***")
        print(f"Best AUC: {best_auc:.4f} at epoch {best_epoch}")
        print(f"Fold training time: {fold_time/60:.1f} minutes")
        
        del model, optimizer, scheduler
        del train_loader, val_loader
        del train_dataset, val_dataset
        clean_gpu_memory()
    
    # Final comprehensive reporting
    print("\n" + "="*60)
    print("CROSS-VALIDATION RESULTS:")
    print("="*60)
    for fold, score in enumerate(best_scores):
        print(f"Fold {fold}: {score:.4f} (Best epoch: {all_fold_metrics[fold]['best_epoch']})")
    
    print(f"\nMean AUC: {np.mean(best_scores):.4f} ± {np.std(best_scores):.4f}")
    print(f"Min AUC: {np.min(best_scores):.4f}")
    print(f"Max AUC: {np.max(best_scores):.4f}")
    print("="*60)
    
    # Generate and save plots
    if not cfg.debug:
        metric_logger.plot_metrics()
        
    return best_scores

In [9]:
if __name__ == "__main__":
    print("\nLoading training data...")
    train_df = pd.read_csv(cfg.train_csv)

    soundscape_df = None
    if cfg.use_external_pseudolabels:
        print(f"Will use external pseudolabels from: {cfg.external_pseudolabels}")
        soundscape_df = pd.read_csv(cfg.external_pseudolabels)

    print("\nStarting training...")
    print(f"LOAD_DATA is set to {cfg.LOAD_DATA}")

    # Run training and get best scores
    best_scores = run_training(train_df, cfg, soundscape_df=soundscape_df)
    
    print("\nTraining complete!")
    cfg.save_config()
    
    # Additional analysis
    print(f"\nFinal Results Summary:")
    print(f"Best scores per fold: {[f'{score:.4f}' for score in best_scores]}")
    print(f"Cross-validation mean: {np.mean(best_scores):.4f} ± {np.std(best_scores):.4f}")


Loading training data...
Will use external pseudolabels from: pseudolabels_even_better.csv

Starting training...
LOAD_DATA is set to True
Loaded 28579 pre-computed mel spectrograms for labeled data
Loading external pseudolabels (soft == False)
Found 116712 samples with confidence > 0.04
Stratified sampling of pseudolabels
Found 206 unique bird species in pseudolabels
Target ~121 samples per species to stay under 25000 total
Sampled 20505 pseudolabeled samples


Loading soundscape spectrograms:   0%|          | 0/20505 [00:00<?, ?it/s]


Training set: 39267 samples
Validation set: 9817 samples

-------------------- Training Model --------------------
Model compiled successfully with backend 'inductor', mode 'default'

Epoch 1/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

Initial best epoch for fold 0: epoch 1, AUC: 0.7199

--- Epoch 1 Summary ---
Train Loss: 0.0350, Train AUC: 0.5279
Val Loss: 0.0254, Val AUC: 0.7199
LR: 0.000500, Epoch Time: 3.62m
Valid Classes: 206
New best AUC: 0.7199 at epoch 1

Epoch 2/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 2, AUC: 0.8363

--- Epoch 2 Summary ---
Train Loss: 0.0254, Train AUC: 0.6380
Val Loss: 0.0212, Val AUC: 0.8363
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.8363 at epoch 2

Epoch 3/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 3, AUC: 0.8725

--- Epoch 3 Summary ---
Train Loss: 0.0234, Train AUC: 0.7164
Val Loss: 0.0190, Val AUC: 0.8725
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.8725 at epoch 3

Epoch 4/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 4, AUC: 0.8874

--- Epoch 4 Summary ---
Train Loss: 0.0224, Train AUC: 0.7296
Val Loss: 0.0180, Val AUC: 0.8874
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.8874 at epoch 4

Epoch 5/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 5, AUC: 0.8988

--- Epoch 5 Summary ---
Train Loss: 0.0215, Train AUC: 0.7898
Val Loss: 0.0168, Val AUC: 0.8988
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.8988 at epoch 5

Epoch 6/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 6, AUC: 0.9073

--- Epoch 6 Summary ---
Train Loss: 0.0210, Train AUC: 0.8043
Val Loss: 0.0160, Val AUC: 0.9073
LR: 0.000497, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.9073 at epoch 6

Epoch 7/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 7, AUC: 0.9129

--- Epoch 7 Summary ---
Train Loss: 0.0205, Train AUC: 0.7662
Val Loss: 0.0156, Val AUC: 0.9129
LR: 0.000488, Epoch Time: 3.32m
Valid Classes: 206
New best AUC: 0.9129 at epoch 7

Epoch 8/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 8 Summary ---
Train Loss: 0.0203, Train AUC: 0.7203
Val Loss: 0.0154, Val AUC: 0.9119
LR: 0.000473, Epoch Time: 3.30m
Valid Classes: 206

Epoch 9/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 9, AUC: 0.9202

--- Epoch 9 Summary ---
Train Loss: 0.0200, Train AUC: 0.7489
Val Loss: 0.0149, Val AUC: 0.9202
LR: 0.000452, Epoch Time: 3.45m
Valid Classes: 206
New best AUC: 0.9202 at epoch 9

Epoch 10/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 10 Summary ---
Train Loss: 0.0196, Train AUC: 0.7796
Val Loss: 0.0147, Val AUC: 0.9201
LR: 0.000427, Epoch Time: 3.40m
Valid Classes: 206

Epoch 11/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 11, AUC: 0.9227

--- Epoch 11 Summary ---
Train Loss: 0.0193, Train AUC: 0.7489
Val Loss: 0.0145, Val AUC: 0.9227
LR: 0.000397, Epoch Time: 3.34m
Valid Classes: 206
New best AUC: 0.9227 at epoch 11

Epoch 12/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 12, AUC: 0.9270

--- Epoch 12 Summary ---
Train Loss: 0.0193, Train AUC: 0.8047
Val Loss: 0.0144, Val AUC: 0.9270
LR: 0.000364, Epoch Time: 3.37m
Valid Classes: 206
New best AUC: 0.9270 at epoch 12

Epoch 13/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 13 Summary ---
Train Loss: 0.0189, Train AUC: 0.8369
Val Loss: 0.0145, Val AUC: 0.9247
LR: 0.000328, Epoch Time: 3.34m
Valid Classes: 206

Epoch 14/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 14 Summary ---
Train Loss: 0.0187, Train AUC: 0.8131
Val Loss: 0.0141, Val AUC: 0.9268
LR: 0.000290, Epoch Time: 3.34m
Valid Classes: 206

Epoch 15/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 0: epoch 15, AUC: 0.9290

--- Epoch 15 Summary ---
Train Loss: 0.0185, Train AUC: 0.7705
Val Loss: 0.0156, Val AUC: 0.9290
LR: 0.000251, Epoch Time: 3.34m
Valid Classes: 206
New best AUC: 0.9290 at epoch 15

Epoch 16/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 16 Summary ---
Train Loss: 0.0184, Train AUC: 0.6628
Val Loss: 0.0141, Val AUC: 0.9273
LR: 0.000211, Epoch Time: 3.34m
Valid Classes: 206

Epoch 17/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 17 Summary ---
Train Loss: 0.0181, Train AUC: 0.7987
Val Loss: nan, Val AUC: 0.9284
LR: 0.000173, Epoch Time: 3.35m
Valid Classes: 206

Epoch 18/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 18 Summary ---
Train Loss: 0.0179, Train AUC: 0.7270
Val Loss: nan, Val AUC: 0.9267
LR: 0.000137, Epoch Time: 3.30m
Valid Classes: 206

Epoch 19/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 19 Summary ---
Train Loss: 0.0175, Train AUC: 0.8221
Val Loss: nan, Val AUC: 0.9258
LR: 0.000104, Epoch Time: 3.29m
Valid Classes: 206

Epoch 20/20 | Fold 0


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 20 Summary ---
Train Loss: 0.0175, Train AUC: 0.7991
Val Loss: nan, Val AUC: 0.9259
LR: 0.000074, Epoch Time: 3.29m
Valid Classes: 206

*** FOLD 0 COMPLETE ***
Best AUC: 0.9290 at epoch 15
Fold training time: 72.4 minutes

Training set: 39267 samples
Validation set: 9817 samples

-------------------- Training Model --------------------
Model compiled successfully with backend 'inductor', mode 'default'

Epoch 1/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

Initial best epoch for fold 1: epoch 1, AUC: 0.5360

--- Epoch 1 Summary ---
Train Loss: 0.0355, Train AUC: 0.5054
Val Loss: 0.0285, Val AUC: 0.5360
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.5360 at epoch 1

Epoch 2/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 2, AUC: 0.6468

--- Epoch 2 Summary ---
Train Loss: 0.0280, Train AUC: 0.5487
Val Loss: 0.0267, Val AUC: 0.6468
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6468 at epoch 2

Epoch 3/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 3, AUC: 0.7549

--- Epoch 3 Summary ---
Train Loss: 0.0271, Train AUC: 0.6068
Val Loss: 0.0247, Val AUC: 0.7549
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7549 at epoch 3

Epoch 4/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 4, AUC: 0.7864

--- Epoch 4 Summary ---
Train Loss: 0.0259, Train AUC: 0.6603
Val Loss: 0.0242, Val AUC: 0.7864
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7864 at epoch 4

Epoch 5/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 5, AUC: 0.8416

--- Epoch 5 Summary ---
Train Loss: 0.0249, Train AUC: 0.6320
Val Loss: 0.0216, Val AUC: 0.8416
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8416 at epoch 5

Epoch 6/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 6, AUC: 0.8555

--- Epoch 6 Summary ---
Train Loss: 0.0242, Train AUC: 0.7377
Val Loss: 0.0206, Val AUC: 0.8555
LR: 0.000497, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8555 at epoch 6

Epoch 7/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 7, AUC: 0.8720

--- Epoch 7 Summary ---
Train Loss: 0.0237, Train AUC: 0.7369
Val Loss: 0.0199, Val AUC: 0.8720
LR: 0.000488, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8720 at epoch 7

Epoch 8/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 8 Summary ---
Train Loss: 0.0232, Train AUC: 0.7129
Val Loss: 0.0197, Val AUC: 0.8665
LR: 0.000473, Epoch Time: 3.29m
Valid Classes: 206

Epoch 9/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 9, AUC: 0.8758

--- Epoch 9 Summary ---
Train Loss: 0.0229, Train AUC: 0.7039
Val Loss: 0.0191, Val AUC: 0.8758
LR: 0.000452, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8758 at epoch 9

Epoch 10/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 10, AUC: 0.8872

--- Epoch 10 Summary ---
Train Loss: 0.0227, Train AUC: 0.6322
Val Loss: 0.0187, Val AUC: 0.8872
LR: 0.000427, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8872 at epoch 10

Epoch 11/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 11 Summary ---
Train Loss: 0.0222, Train AUC: 0.7190
Val Loss: 0.0184, Val AUC: 0.8869
LR: 0.000397, Epoch Time: 3.29m
Valid Classes: 206

Epoch 12/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 12, AUC: 0.8873

--- Epoch 12 Summary ---
Train Loss: 0.0221, Train AUC: 0.7454
Val Loss: 0.0185, Val AUC: 0.8873
LR: 0.000364, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8873 at epoch 12

Epoch 13/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 13, AUC: 0.8975

--- Epoch 13 Summary ---
Train Loss: 0.0217, Train AUC: 0.8016
Val Loss: 0.0178, Val AUC: 0.8975
LR: 0.000328, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8975 at epoch 13

Epoch 14/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 14, AUC: 0.8981

--- Epoch 14 Summary ---
Train Loss: 0.0217, Train AUC: 0.7530
Val Loss: 0.0175, Val AUC: 0.8981
LR: 0.000290, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8981 at epoch 14

Epoch 15/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 15 Summary ---
Train Loss: 0.0215, Train AUC: 0.7740
Val Loss: 0.0175, Val AUC: 0.8961
LR: 0.000251, Epoch Time: 3.29m
Valid Classes: 206

Epoch 16/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 16, AUC: 0.9013

--- Epoch 16 Summary ---
Train Loss: 0.0212, Train AUC: 0.7089
Val Loss: 0.0172, Val AUC: 0.9013
LR: 0.000211, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.9013 at epoch 16

Epoch 17/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 17, AUC: 0.9032

--- Epoch 17 Summary ---
Train Loss: 0.0210, Train AUC: 0.7604
Val Loss: 0.0169, Val AUC: 0.9032
LR: 0.000173, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.9032 at epoch 17

Epoch 18/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 18, AUC: 0.9039

--- Epoch 18 Summary ---
Train Loss: 0.0208, Train AUC: 0.7512
Val Loss: 0.0168, Val AUC: 0.9039
LR: 0.000137, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.9039 at epoch 18

Epoch 19/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 19, AUC: 0.9043

--- Epoch 19 Summary ---
Train Loss: 0.0206, Train AUC: 0.7742
Val Loss: 0.0167, Val AUC: 0.9043
LR: 0.000104, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.9043 at epoch 19

Epoch 20/20 | Fold 1


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 1: epoch 20, AUC: 0.9047

--- Epoch 20 Summary ---
Train Loss: 0.0205, Train AUC: 0.7856
Val Loss: 0.0167, Val AUC: 0.9047
LR: 0.000074, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.9047 at epoch 20

*** FOLD 1 COMPLETE ***
Best AUC: 0.9047 at epoch 20
Fold training time: 71.0 minutes

Training set: 39267 samples
Validation set: 9817 samples

-------------------- Training Model --------------------
Model compiled successfully with backend 'inductor', mode 'default'

Epoch 1/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

Initial best epoch for fold 2: epoch 1, AUC: 0.5247

--- Epoch 1 Summary ---
Train Loss: 0.0355, Train AUC: 0.4963
Val Loss: 0.0285, Val AUC: 0.5247
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.5247 at epoch 1

Epoch 2/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 2, AUC: 0.6032

--- Epoch 2 Summary ---
Train Loss: 0.0284, Train AUC: 0.5151
Val Loss: 0.0277, Val AUC: 0.6032
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6032 at epoch 2

Epoch 3/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 3, AUC: 0.6958

--- Epoch 3 Summary ---
Train Loss: 0.0276, Train AUC: 0.5855
Val Loss: 0.0261, Val AUC: 0.6958
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6958 at epoch 3

Epoch 4/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 4, AUC: 0.7685

--- Epoch 4 Summary ---
Train Loss: 0.0267, Train AUC: 0.5948
Val Loss: 0.0247, Val AUC: 0.7685
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7685 at epoch 4

Epoch 5/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 5, AUC: 0.7886

--- Epoch 5 Summary ---
Train Loss: 0.0259, Train AUC: 0.6635
Val Loss: 0.0238, Val AUC: 0.7886
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7886 at epoch 5

Epoch 6/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 6, AUC: 0.7936

--- Epoch 6 Summary ---
Train Loss: 0.0253, Train AUC: 0.6409
Val Loss: 0.0235, Val AUC: 0.7936
LR: 0.000497, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7936 at epoch 6

Epoch 7/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 7, AUC: 0.8236

--- Epoch 7 Summary ---
Train Loss: 0.0249, Train AUC: 0.6736
Val Loss: 0.0223, Val AUC: 0.8236
LR: 0.000488, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8236 at epoch 7

Epoch 8/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 8, AUC: 0.8266

--- Epoch 8 Summary ---
Train Loss: 0.0245, Train AUC: 0.6918
Val Loss: 0.0223, Val AUC: 0.8266
LR: 0.000473, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8266 at epoch 8

Epoch 9/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 9, AUC: 0.8410

--- Epoch 9 Summary ---
Train Loss: 0.0243, Train AUC: 0.7307
Val Loss: 0.0214, Val AUC: 0.8410
LR: 0.000452, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8410 at epoch 9

Epoch 10/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 10, AUC: 0.8519

--- Epoch 10 Summary ---
Train Loss: 0.0239, Train AUC: 0.6865
Val Loss: 0.0211, Val AUC: 0.8519
LR: 0.000427, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8519 at epoch 10

Epoch 11/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 11 Summary ---
Train Loss: 0.0237, Train AUC: 0.6620
Val Loss: 0.0208, Val AUC: 0.8516
LR: 0.000397, Epoch Time: 3.29m
Valid Classes: 206

Epoch 12/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 12, AUC: 0.8590

--- Epoch 12 Summary ---
Train Loss: 0.0235, Train AUC: 0.7235
Val Loss: 0.0205, Val AUC: 0.8590
LR: 0.000364, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8590 at epoch 12

Epoch 13/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 13, AUC: 0.8619

--- Epoch 13 Summary ---
Train Loss: 0.0233, Train AUC: 0.7324
Val Loss: 0.0204, Val AUC: 0.8619
LR: 0.000328, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8619 at epoch 13

Epoch 14/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 14, AUC: 0.8659

--- Epoch 14 Summary ---
Train Loss: 0.0231, Train AUC: 0.7150
Val Loss: 0.0201, Val AUC: 0.8659
LR: 0.000290, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8659 at epoch 14

Epoch 15/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 15, AUC: 0.8678

--- Epoch 15 Summary ---
Train Loss: 0.0229, Train AUC: 0.6996
Val Loss: 0.0200, Val AUC: 0.8678
LR: 0.000251, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8678 at epoch 15

Epoch 16/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 16, AUC: 0.8692

--- Epoch 16 Summary ---
Train Loss: 0.0227, Train AUC: 0.7256
Val Loss: 0.0199, Val AUC: 0.8692
LR: 0.000211, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8692 at epoch 16

Epoch 17/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 17, AUC: 0.8746

--- Epoch 17 Summary ---
Train Loss: 0.0223, Train AUC: 0.7154
Val Loss: 0.0195, Val AUC: 0.8746
LR: 0.000173, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8746 at epoch 17

Epoch 18/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 2: epoch 18, AUC: 0.8773

--- Epoch 18 Summary ---
Train Loss: 0.0223, Train AUC: 0.6616
Val Loss: 0.0194, Val AUC: 0.8773
LR: 0.000137, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8773 at epoch 18

Epoch 19/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 19 Summary ---
Train Loss: 0.0221, Train AUC: 0.7388
Val Loss: 0.0194, Val AUC: 0.8761
LR: 0.000104, Epoch Time: 3.29m
Valid Classes: 206

Epoch 20/20 | Fold 2


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 20 Summary ---
Train Loss: 0.0220, Train AUC: 0.7486
Val Loss: 0.0193, Val AUC: 0.8768
LR: 0.000074, Epoch Time: 3.29m
Valid Classes: 206

*** FOLD 2 COMPLETE ***
Best AUC: 0.8773 at epoch 18
Fold training time: 71.0 minutes

Training set: 39267 samples
Validation set: 9817 samples

-------------------- Training Model --------------------
Model compiled successfully with backend 'inductor', mode 'default'

Epoch 1/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

Initial best epoch for fold 3: epoch 1, AUC: 0.5255

--- Epoch 1 Summary ---
Train Loss: 0.0356, Train AUC: 0.5209
Val Loss: 0.0286, Val AUC: 0.5255
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.5255 at epoch 1

Epoch 2/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 2, AUC: 0.5859

--- Epoch 2 Summary ---
Train Loss: 0.0284, Train AUC: 0.5130
Val Loss: 0.0280, Val AUC: 0.5859
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.5859 at epoch 2

Epoch 3/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 3, AUC: 0.6375

--- Epoch 3 Summary ---
Train Loss: 0.0279, Train AUC: 0.5803
Val Loss: 0.0271, Val AUC: 0.6375
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6375 at epoch 3

Epoch 4/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 4, AUC: 0.6872

--- Epoch 4 Summary ---
Train Loss: 0.0275, Train AUC: 0.5796
Val Loss: 0.0263, Val AUC: 0.6872
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6872 at epoch 4

Epoch 5/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 5, AUC: 0.7191

--- Epoch 5 Summary ---
Train Loss: 0.0268, Train AUC: 0.6435
Val Loss: 0.0257, Val AUC: 0.7191
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7191 at epoch 5

Epoch 6/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 6, AUC: 0.7558

--- Epoch 6 Summary ---
Train Loss: 0.0264, Train AUC: 0.6520
Val Loss: nan, Val AUC: 0.7558
LR: 0.000497, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7558 at epoch 6

Epoch 7/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 7, AUC: 0.7717

--- Epoch 7 Summary ---
Train Loss: 0.0261, Train AUC: 0.5992
Val Loss: 0.0245, Val AUC: 0.7717
LR: 0.000488, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7717 at epoch 7

Epoch 8/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 8, AUC: 0.7839

--- Epoch 8 Summary ---
Train Loss: 0.0258, Train AUC: 0.6046
Val Loss: 0.0240, Val AUC: 0.7839
LR: 0.000473, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7839 at epoch 8

Epoch 9/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 9, AUC: 0.7921

--- Epoch 9 Summary ---
Train Loss: 0.0256, Train AUC: 0.6647
Val Loss: 0.0238, Val AUC: 0.7921
LR: 0.000452, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7921 at epoch 9

Epoch 10/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 10, AUC: 0.7930

--- Epoch 10 Summary ---
Train Loss: 0.0254, Train AUC: 0.6512
Val Loss: nan, Val AUC: 0.7930
LR: 0.000427, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7930 at epoch 10

Epoch 11/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 11, AUC: 0.8070

--- Epoch 11 Summary ---
Train Loss: 0.0252, Train AUC: 0.6224
Val Loss: 0.0232, Val AUC: 0.8070
LR: 0.000397, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8070 at epoch 11

Epoch 12/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 12 Summary ---
Train Loss: 0.0250, Train AUC: 0.6836
Val Loss: 0.0231, Val AUC: 0.8063
LR: 0.000364, Epoch Time: 3.29m
Valid Classes: 206

Epoch 13/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 13, AUC: 0.8144

--- Epoch 13 Summary ---
Train Loss: 0.0248, Train AUC: 0.6557
Val Loss: 0.0230, Val AUC: 0.8144
LR: 0.000328, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8144 at epoch 13

Epoch 14/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 14, AUC: 0.8211

--- Epoch 14 Summary ---
Train Loss: 0.0247, Train AUC: 0.6591
Val Loss: 0.0226, Val AUC: 0.8211
LR: 0.000290, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8211 at epoch 14

Epoch 15/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 15, AUC: 0.8258

--- Epoch 15 Summary ---
Train Loss: 0.0246, Train AUC: 0.6808
Val Loss: 0.0224, Val AUC: 0.8258
LR: 0.000251, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8258 at epoch 15

Epoch 16/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 16 Summary ---
Train Loss: 0.0244, Train AUC: 0.6767
Val Loss: 0.0223, Val AUC: 0.8248
LR: 0.000211, Epoch Time: 3.30m
Valid Classes: 206

Epoch 17/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 17, AUC: 0.8270

--- Epoch 17 Summary ---
Train Loss: 0.0242, Train AUC: 0.7192
Val Loss: 0.0222, Val AUC: 0.8270
LR: 0.000173, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8270 at epoch 17

Epoch 18/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 18, AUC: 0.8301

--- Epoch 18 Summary ---
Train Loss: 0.0240, Train AUC: 0.7235
Val Loss: 0.0219, Val AUC: 0.8301
LR: 0.000137, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8301 at epoch 18

Epoch 19/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 19, AUC: 0.8330

--- Epoch 19 Summary ---
Train Loss: 0.0239, Train AUC: 0.7048
Val Loss: 0.0219, Val AUC: 0.8330
LR: 0.000104, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8330 at epoch 19

Epoch 20/20 | Fold 3


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 3: epoch 20, AUC: 0.8334

--- Epoch 20 Summary ---
Train Loss: 0.0238, Train AUC: 0.6778
Val Loss: 0.0218, Val AUC: 0.8334
LR: 0.000074, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.8334 at epoch 20

*** FOLD 3 COMPLETE ***
Best AUC: 0.8334 at epoch 20
Fold training time: 71.0 minutes

Training set: 39268 samples
Validation set: 9816 samples

-------------------- Training Model --------------------
Model compiled successfully with backend 'inductor', mode 'default'

Epoch 1/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

Initial best epoch for fold 4: epoch 1, AUC: 0.5203

--- Epoch 1 Summary ---
Train Loss: 0.0353, Train AUC: 0.4886
Val Loss: 0.0285, Val AUC: 0.5203
LR: 0.000500, Epoch Time: 3.30m
Valid Classes: 206
New best AUC: 0.5203 at epoch 1

Epoch 2/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 2, AUC: 0.5286

--- Epoch 2 Summary ---
Train Loss: 0.0286, Train AUC: 0.5104
Val Loss: 0.0285, Val AUC: 0.5286
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.5286 at epoch 2

Epoch 3/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 3, AUC: 0.6079

--- Epoch 3 Summary ---
Train Loss: 0.0282, Train AUC: 0.5193
Val Loss: 0.0274, Val AUC: 0.6079
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6079 at epoch 3

Epoch 4/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 4, AUC: 0.6474

--- Epoch 4 Summary ---
Train Loss: 0.0279, Train AUC: 0.5743
Val Loss: 0.0271, Val AUC: 0.6474
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6474 at epoch 4

Epoch 5/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 5, AUC: 0.6714

--- Epoch 5 Summary ---
Train Loss: 0.0276, Train AUC: 0.5559
Val Loss: 0.0268, Val AUC: 0.6714
LR: 0.000500, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.6714 at epoch 5

Epoch 6/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 6, AUC: 0.7085

--- Epoch 6 Summary ---
Train Loss: 0.0272, Train AUC: 0.6003
Val Loss: 0.0260, Val AUC: 0.7085
LR: 0.000497, Epoch Time: 3.31m
Valid Classes: 206
New best AUC: 0.7085 at epoch 6

Epoch 7/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 7, AUC: 0.7289

--- Epoch 7 Summary ---
Train Loss: 0.0269, Train AUC: 0.6270
Val Loss: 0.0257, Val AUC: 0.7289
LR: 0.000488, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7289 at epoch 7

Epoch 8/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 8, AUC: 0.7424

--- Epoch 8 Summary ---
Train Loss: 0.0266, Train AUC: 0.6069
Val Loss: 0.0255, Val AUC: 0.7424
LR: 0.000473, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7424 at epoch 8

Epoch 9/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 9, AUC: 0.7518

--- Epoch 9 Summary ---
Train Loss: 0.0264, Train AUC: 0.6612
Val Loss: 0.0253, Val AUC: 0.7518
LR: 0.000452, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7518 at epoch 9

Epoch 10/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 10 Summary ---
Train Loss: 0.0262, Train AUC: 0.6723
Val Loss: 0.0251, Val AUC: 0.7501
LR: 0.000427, Epoch Time: 3.29m
Valid Classes: 206

Epoch 11/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 11, AUC: 0.7594

--- Epoch 11 Summary ---
Train Loss: 0.0261, Train AUC: 0.6689
Val Loss: 0.0249, Val AUC: 0.7594
LR: 0.000397, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7594 at epoch 11

Epoch 12/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 12, AUC: 0.7682

--- Epoch 12 Summary ---
Train Loss: 0.0260, Train AUC: 0.6412
Val Loss: 0.0247, Val AUC: 0.7682
LR: 0.000364, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7682 at epoch 12

Epoch 13/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 13, AUC: 0.7767

--- Epoch 13 Summary ---
Train Loss: 0.0258, Train AUC: 0.6765
Val Loss: 0.0244, Val AUC: 0.7767
LR: 0.000328, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7767 at epoch 13

Epoch 14/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 14 Summary ---
Train Loss: 0.0257, Train AUC: 0.6326
Val Loss: 0.0244, Val AUC: 0.7750
LR: 0.000290, Epoch Time: 3.29m
Valid Classes: 206

Epoch 15/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 15, AUC: 0.7826

--- Epoch 15 Summary ---
Train Loss: 0.0256, Train AUC: 0.6619
Val Loss: 0.0243, Val AUC: 0.7826
LR: 0.000251, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7826 at epoch 15

Epoch 16/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 16, AUC: 0.7833

--- Epoch 16 Summary ---
Train Loss: 0.0255, Train AUC: 0.6569
Val Loss: 0.0240, Val AUC: 0.7833
LR: 0.000211, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7833 at epoch 16

Epoch 17/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 17, AUC: 0.7876

--- Epoch 17 Summary ---
Train Loss: 0.0254, Train AUC: 0.6731
Val Loss: 0.0239, Val AUC: 0.7876
LR: 0.000173, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7876 at epoch 17

Epoch 18/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]

New best epoch for fold 4: epoch 18, AUC: 0.7899

--- Epoch 18 Summary ---
Train Loss: 0.0252, Train AUC: 0.6809
Val Loss: 0.0239, Val AUC: 0.7899
LR: 0.000137, Epoch Time: 3.29m
Valid Classes: 206
New best AUC: 0.7899 at epoch 18

Epoch 19/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 19 Summary ---
Train Loss: 0.0252, Train AUC: 0.6080
Val Loss: 0.0238, Val AUC: 0.7890
LR: 0.000104, Epoch Time: 3.30m
Valid Classes: 206

Epoch 20/20 | Fold 4


Training:   0%|          | 0/1227 [00:00<?, ?it/s]

Validation:   0%|          | 0/154 [00:00<?, ?it/s]


--- Epoch 20 Summary ---
Train Loss: 0.0251, Train AUC: 0.6877
Val Loss: 0.0237, Val AUC: 0.7899
LR: 0.000074, Epoch Time: 3.30m
Valid Classes: 206

*** FOLD 4 COMPLETE ***
Best AUC: 0.7899 at epoch 18
Fold training time: 71.0 minutes

CROSS-VALIDATION RESULTS:
Fold 0: 0.9290 (Best epoch: 15)
Fold 1: 0.9047 (Best epoch: 20)
Fold 2: 0.8773 (Best epoch: 18)
Fold 3: 0.8334 (Best epoch: 20)
Fold 4: 0.7899 (Best epoch: 18)

Mean AUC: 0.8669 ± 0.0499
Min AUC: 0.7899
Max AUC: 0.9290
Plotting ROC curves for folds: [0, 1, 2, 3, 4]
Metrics saved to output/metrics//metrics_20250527_020112.csv

Training complete!
Config saved to output/configs/config_20250527_020112_efficientnet_b1.json

Final Results Summary:
Best scores per fold: ['0.9290', '0.9047', '0.8773', '0.8334', '0.7899']
Cross-validation mean: 0.8669 ± 0.0499


In [10]:
timm.list_models(pretrained=True)

['aimv2_1b_patch14_224.apple_pt',
 'aimv2_1b_patch14_336.apple_pt',
 'aimv2_1b_patch14_448.apple_pt',
 'aimv2_3b_patch14_224.apple_pt',
 'aimv2_3b_patch14_336.apple_pt',
 'aimv2_3b_patch14_448.apple_pt',
 'aimv2_huge_patch14_224.apple_pt',
 'aimv2_huge_patch14_336.apple_pt',
 'aimv2_huge_patch14_448.apple_pt',
 'aimv2_large_patch14_224.apple_pt',
 'aimv2_large_patch14_224.apple_pt_dist',
 'aimv2_large_patch14_336.apple_pt',
 'aimv2_large_patch14_336.apple_pt_dist',
 'aimv2_large_patch14_448.apple_pt',
 'bat_resnext26ts.ch_in1k',
 'beit_base_patch16_224.in22k_ft_in22k',
 'beit_base_patch16_224.in22k_ft_in22k_in1k',
 'beit_base_patch16_384.in22k_ft_in22k_in1k',
 'beit_large_patch16_224.in22k_ft_in22k',
 'beit_large_patch16_224.in22k_ft_in22k_in1k',
 'beit_large_patch16_384.in22k_ft_in22k_in1k',
 'beit_large_patch16_512.in22k_ft_in22k_in1k',
 'beitv2_base_patch16_224.in1k_ft_in1k',
 'beitv2_base_patch16_224.in1k_ft_in22k',
 'beitv2_base_patch16_224.in1k_ft_in22k_in1k',
 'beitv2_large_patc