# HerdNet Training Pipeline

Pipeline completo de entrenamiento de HerdNet siguiendo la metodología de Delplanque et al. (2023):

1. **Fase 0**: Generación de parches de entrenamiento y validación
2. **Fase 1**: Entrenamiento inicial (Stage 1) sobre parches
3. **Fase 2**: Generación de Hard Negative Patches (HNPs)
4. **Fase 3**: Entrenamiento con HNPs (Stage 2)
5. **Fase 4**: Evaluación final sobre imágenes completas


## Imports


In [1]:
from pathlib import Path
from shutil import copy2
import json

import pandas as pd
from dataclasses import asdict

from utils.herdnet import (
    TrainConfig,
    train_stage1,
    train_stage2,
    HNPConfig,
    generate_hard_negative_patches,
    EvalConfig,
    evaluate_full_images,
    evaluate_points_from_csv,
    patch_images
)

from utils.rf_detr import generate_patch_dataset, PatchSummary


## Configuración Global


In [15]:
# Configuración de paths
DATA_ROOT = Path("../../data")
OUTPUT_ROOT = Path("outputs/herdnet")

# Configuración de patches
PATCH_SIZE = 512
PATCH_OVERLAP = 0
MIN_VISIBILITY = 0.1

# Configuración de entrenamiento
BATCH_SIZE = 4
NUM_WORKERS = 4
EPOCHS_STAGE1 = 2
EPOCHS_STAGE2 = 2
LR_STAGE1 = 1e-4
LR_STAGE2 = 1e-6

# Configuración de evaluación
MATCH_RADIUS = 5.0
STITCH_OVERLAP = 160

# WandB (opcional)
WANDB_PROJECT = None  # "herdnet-training"
WANDB_ENTITY = None
WANDB_MODE = "disabled"  # "online" para activar


### Descargar datos

In [None]:
# Download some of the data of Delplanque et al. (2021) as an example
!gdown 1CcTAZZJdwrBfCPJtVH6VBU3luGKIN9st -O ../../data.zip
!unzip -oq ../../data.zip -d ../../

# Fase 0 — Generación de Parches

Dividimos las imágenes de alta resolución (24MP) en parches de 512×512 píxeles para facilitar el entrenamiento.


In [3]:
patch_images(
    root= DATA_ROOT / "val",
    height=PATCH_SIZE,
    width=PATCH_SIZE,
    overlap=PATCH_OVERLAP,
    dest= DATA_ROOT / "val_patches",
    csv_path= str(DATA_ROOT / "val.csv"),
    min_visibility=MIN_VISIBILITY,
    save_all=False
)

Creating the buffer: 100%|██████████| 111/111 [00:34<00:00,  3.21it/s]
Exporting patches: 100%|██████████| 111/111 [00:33<00:00,  3.34it/s]


# Fase 1 — Entrenamiento Stage 1

Entrenar HerdNet sobre los parches generados. Este es el entrenamiento inicial sin Hard Negative Patches.


In [4]:
OUTPUT_ROOT = Path("outputs/herdnet")

In [5]:
stage1_config = TrainConfig(
    train_root=DATA_ROOT / "train_patches",
    train_csv=DATA_ROOT / "train_patches.csv",
    val_root=DATA_ROOT / "val_patches",
    val_csv=DATA_ROOT / "val_patches" / "gt.csv",
    work_dir=OUTPUT_ROOT / "stage1",
    epochs=EPOCHS_STAGE1,
    batch_size=BATCH_SIZE,
    learning_rate=LR_STAGE1,
    num_workers=NUM_WORKERS,
    patch_size=PATCH_SIZE,
    stitch_overlap=STITCH_OVERLAP,
    wandb_project=WANDB_PROJECT,
    wandb_entity=WANDB_ENTITY,
    wandb_mode=WANDB_MODE,
    wandb_run_name="herdnet_stage1",
    pretrained_backbone = 'none'
)

In [6]:
print("\n" + "="*70)
print("INICIANDO ENTRENAMIENTO STAGE 1")
print("="*70 + "\n")

stage1_result = train_stage1(stage1_config)

print("\n" + "="*70)
print("STAGE 1 COMPLETADO")
print("="*70)
print(f" Best checkpoint: {stage1_result.best_checkpoint}")
print(f" Latest checkpoint: {stage1_result.latest_checkpoint}")
print("="*70 + "\n")



INICIANDO ENTRENAMIENTO STAGE 1

[INFO] Skipping backbone pretraining
[TRAINING] - Epoch: [1] [   1/1942] eta: 0:30:14 lr: 0.000002 loss: 9738.0117 (9738.0117) focal_loss: 9736.0527 (9736.0527) ce_loss: 1.9593 (1.9593) time: 0.9346 data: 0.1931 max mem: 2099
[TRAINING] - Epoch: [1] [ 101/1942] eta: 0:04:05 lr: 0.000100 loss: 1270.9606 (5019.9500) focal_loss: 1269.3458 (5018.2950) ce_loss: 1.3077 (1.6550) time: 0.1253 data: 0.0002 max mem: 2245
[TRAINING] - Epoch: [1] [ 201/1942] eta: 0:03:45 lr: 0.000100 loss: 137.5054 (2684.2813) focal_loss: 136.2951 (2682.7222) ce_loss: 1.2701 (1.5591) time: 0.1257 data: 0.0002 max mem: 2245
[TRAINING] - Epoch: [1] [ 301/1942] eta: 0:03:30 lr: 0.000100 loss: 52.7969 (1817.9287) focal_loss: 51.1333 (1816.4071) ce_loss: 1.1967 (1.5216) time: 0.1261 data: 0.0002 max mem: 2245
[TRAINING] - Epoch: [1] [ 401/1942] eta: 0:03:16 lr: 0.000100 loss: 28.0518 (1373.8057) focal_loss: 26.7880 (1372.3274) ce_loss: 1.1535 (1.4783) time: 0.1266 data: 0.0002 max mem:

# Fase 2 — Generación de Hard Negative Patches

Usar el modelo de Stage 1 para generar predicciones sobre las imágenes de entrenamiento completas y extraer parches de falsos positivos.


## Preparar CSV de imágenes completas

Necesitamos un CSV con las anotaciones de las imágenes completas de entrenamiento (no los parches).


## Generar HNPs


In [7]:
hnp_config = HNPConfig(
    checkpoint=stage1_result.best_checkpoint,
    train_csv=DATA_ROOT / "train.csv",
    train_root=DATA_ROOT / "train",
    output_root=DATA_ROOT / "hnp_patches",
    patch_size=PATCH_SIZE,
    patch_overlap=PATCH_OVERLAP,
    min_score=0.5,
    batch_size=1,
    num_workers=NUM_WORKERS,
)


In [8]:
print("\n" + "="*70)
print("GENERANDO HARD NEGATIVE PATCHES")
print("="*70 + "\n")

hnp_result = generate_hard_negative_patches(hnp_config)

print("\n" + "="*70)
print("HNP GENERATION COMPLETADO")
print("="*70)
print(f" Parches HNP creados: {hnp_result.hnp_patches_created}")
print(f" Detecciones CSV: {hnp_result.detections_csv}")
print(f" Output dir: {hnp_result.output_root}")
print("="*70 + "\n")



GENERANDO HARD NEGATIVE PATCHES

[INFO] Running inference with HerdNetEvaluator...
[HNP Generation] [  1/928] eta: 1:24:20  time: 5.4535 data: 1.6628 max mem: 3158
[HNP Generation] [ 11/928] eta: 0:27:50  time: 1.8202 data: 0.1513 max mem: 3158
[HNP Generation] [ 21/928] eta: 0:24:44  time: 1.4442 data: 0.0002 max mem: 3158
[HNP Generation] [ 31/928] eta: 0:23:37  time: 1.4448 data: 0.0002 max mem: 3158
[HNP Generation] [ 41/928] eta: 0:22:53  time: 1.4535 data: 0.0002 max mem: 3158
[HNP Generation] [ 51/928] eta: 0:22:19  time: 1.4444 data: 0.0002 max mem: 3158
[HNP Generation] [ 61/928] eta: 0:21:52  time: 1.4403 data: 0.0002 max mem: 3158
[HNP Generation] [ 71/928] eta: 0:21:27  time: 1.4381 data: 0.0002 max mem: 3158
[HNP Generation] [ 81/928] eta: 0:21:07  time: 1.4446 data: 0.0002 max mem: 3158
[HNP Generation] [ 91/928] eta: 0:20:49  time: 1.4540 data: 0.0002 max mem: 3158
[HNP Generation] [101/928] eta: 0:20:29  time: 1.4459 data: 0.0002 max mem: 3158
[HNP Generation] [111/928

Creating the buffer: 100%|██████████| 392/392 [01:53<00:00,  3.44it/s]


[INFO] Generated gt.csv with 18519 entries (for reference only, will be discarded)


Saving HNP patches: 100%|██████████| 392/392 [01:21<00:00,  4.83it/s]


[SUCCESS] Generated 18519 HNP patches in ../../data/hnp_patches
[INFO] Detections CSV: ../../data/hnp_patches/detections.csv
[INFO] HNP patches: ../../data/hnp_patches/*.JPG
[INFO] gt.csv: ../../data/hnp_patches/gt.csv (DISCARD THIS - use original train CSV)

HNP GENERATION COMPLETADO
✓ Parches HNP creados: 18519
✓ Detecciones CSV: ../../data/hnp_patches/detections.csv
✓ Output dir: ../../data/hnp_patches






## Combinar parches originales con HNPs para Stage 2

Copiar todos los parches originales de Stage 1 y añadir los HNPs generados.


In [12]:
stage2_train_dir = DATA_ROOT / "train_hnp_patches"
stage2_train_dir.mkdir(parents=True, exist_ok=True)

# Copiar parches originales de Stage 1
print("Copiando parches originales de Stage 1...")
stage1_train_dir = DATA_ROOT / "train_patches"
copied_original = 0

for pattern in ("*.jpg", "*.JPG", "*.png", "*.PNG"):
    for src in stage1_train_dir.glob(pattern):
        dst = stage2_train_dir / src.name
        if not dst.exists():
            copy2(src, dst)
            copied_original += 1

print(f" Copiados {copied_original} parches originales")

# Copiar HNPs
print("\nCopiando HNP patches...")
hnp_dir = DATA_ROOT / "hnp_patches"
copied_hnp = 0

for pattern in ("*.jpg", "*.JPG", "*.png", "*.PNG"):
    for src in hnp_dir.glob(pattern):
        dst = stage2_train_dir / src.name
        if not dst.exists():
            copy2(src, dst)
            copied_hnp += 1

print(f" Copiados {copied_hnp} HNP patches")
print(f"\n Total Stage 2 patches: {copied_original + copied_hnp}")


Copiando parches originales de Stage 1...
 Copiados 0 parches originales

Copiando HNP patches...
 Copiados 0 HNP patches

 Total Stage 2 patches: 0


**IMPORTANTE**: Para Stage 2, usamos el CSV original de Stage 1 (gt.csv), NO el gt.csv generado por HNP.

Los patches que no están en el CSV serán tratados automáticamente como background por `FolderDataset`.


In [11]:
# Usar el GT original de stage1 (NO el de HNP)
copy2(
    DATA_ROOT / "train_patches.csv",
    DATA_ROOT / "train_hnp_patches.csv",
)

print(f" CSV de Stage 2 listo: {stage2_train_dir / 'gt.csv'}")
print("  (Contiene solo anotaciones originales; HNPs son background)")


 CSV de Stage 2 listo: ../../data/train_hbp_patches/gt.csv
  (Contiene solo anotaciones originales; HNPs son background)


# Fase 3 — Entrenamiento Stage 2

Entrenar con los parches originales + HNPs usando una tasa de aprendizaje más baja.


In [21]:
stage2_config = TrainConfig(
    train_root=stage2_train_dir,
    train_csv=DATA_ROOT / "train_hnp_patches.csv",
    val_root=DATA_ROOT / "val_patches",
    val_csv=DATA_ROOT / "val_patches" / "gt.csv",
    work_dir=OUTPUT_ROOT / "stage2",
    epochs=EPOCHS_STAGE2,
    batch_size=BATCH_SIZE,
    learning_rate=LR_STAGE2,
    num_workers=NUM_WORKERS,
    patch_size=PATCH_SIZE,
    stitch_overlap=STITCH_OVERLAP,
    wandb_project=WANDB_PROJECT,
    wandb_entity=WANDB_ENTITY,
    wandb_mode=WANDB_MODE,
    wandb_run_name="herdnet_stage2",
)


In [22]:
print("\n" + "="*70)
print("INICIANDO ENTRENAMIENTO STAGE 2")
print("="*70 + "\n")

stage2_result = train_stage2(
    config=stage2_config,
    stage1_checkpoint=stage1_result.best_checkpoint,
    learning_rate=LR_STAGE2,
)

print("\n" + "="*70)
print("STAGE 2 COMPLETADO")
print("="*70)
print(f" Best checkpoint: {stage2_result.best_checkpoint}")
print(f" Latest checkpoint: {stage2_result.latest_checkpoint}")
print("="*70 + "\n")



INICIANDO ENTRENAMIENTO STAGE 2

[INFO] Training samples: 10364 (includes HNPs as background)
[TRAINING] - Epoch: [1] [   1/2591] eta: 0:17:57 lr: 0.000001 loss: 3.2512 (3.2512) focal_loss: 1.2191 (1.2191) ce_loss: 2.0320 (2.0320) time: 0.4159 data: 0.2798 max mem: 3677
[TRAINING] - Epoch: [1] [ 101/2591] eta: 0:05:37 lr: 0.000001 loss: 1.8945 (2.1903) focal_loss: 1.0714 (1.2759) ce_loss: 0.8875 (0.9144) time: 0.1329 data: 0.0002 max mem: 3677
[TRAINING] - Epoch: [1] [ 201/2591] eta: 0:05:21 lr: 0.000001 loss: 1.8332 (1.9932) focal_loss: 0.8395 (1.0932) ce_loss: 0.8166 (0.9000) time: 0.1341 data: 0.0006 max mem: 3677
[TRAINING] - Epoch: [1] [ 301/2591] eta: 0:05:07 lr: 0.000001 loss: 1.5842 (1.9496) focal_loss: 0.7323 (1.0583) ce_loss: 0.7426 (0.8913) time: 0.1339 data: 0.0002 max mem: 3677
[TRAINING] - Epoch: [1] [ 401/2591] eta: 0:04:54 lr: 0.000001 loss: 1.7607 (1.9069) focal_loss: 0.8601 (1.0172) ce_loss: 0.7786 (0.8898) time: 0.1347 data: 0.0002 max mem: 3677
[TRAINING] - Epoch: 

# Fase 4 — Evaluación Final

Evaluar el modelo Stage 2 sobre imágenes completas de validación/test.


## Evaluar Stage 1


In [18]:
eval_stage1_config = EvalConfig(
    checkpoint=stage1_result.best_checkpoint,
    csv=DATA_ROOT / "test.csv",
    root=DATA_ROOT / "test",
    output_dir=OUTPUT_ROOT / "eval_stage1",
    patch_size=PATCH_SIZE,
    overlap=STITCH_OVERLAP,
    upsample=True,
    match_radius=MATCH_RADIUS,
    batch_size=1,
    num_workers=NUM_WORKERS,
)


In [19]:
print("\n" + "="*70)
print("EVALUANDO STAGE 1")
print("="*70 + "\n")

eval_stage1_result = evaluate_full_images(eval_stage1_config)

print("\nStage 1 Metrics:")
pd.DataFrame([eval_stage1_result.metrics["overall"]])



EVALUANDO STAGE 1

[INFO] Evaluating 258 images...


Collecting detections: 100%|██████████| 258/258 [10:21<00:00,  2.41s/it]



HerdNet Evaluation Summary
Precision: 0.0021
Recall:    0.1970
F1 Score:  0.0042
MAE:       817.8953
RMSE:      1848.2999

Detections: outputs/herdnet/eval_stage1/detections.csv
Metrics:    outputs/herdnet/eval_stage1/metrics.json


Stage 1 Metrics:


Unnamed: 0,precision,recall,f1_score,mae,rmse,mse,accuracy
0,0.002124,0.197042,0.004202,817.895349,1848.299905,3416213.0,0.430464


## Evaluar Stage 2


In [25]:
eval_stage2_config = EvalConfig(
    checkpoint=stage2_result.best_checkpoint,
    csv=DATA_ROOT / "test.csv",
    root=DATA_ROOT / "test",
    output_dir=OUTPUT_ROOT / "eval_stage2",
    patch_size=PATCH_SIZE,
    overlap=STITCH_OVERLAP,
    upsample=True,
    match_radius=MATCH_RADIUS,
    batch_size=1,
    num_workers=NUM_WORKERS,
)


In [26]:
print("\n" + "="*70)
print("EVALUANDO STAGE 2")
print("="*70 + "\n")

eval_stage2_result = evaluate_full_images(eval_stage2_config)

print("\nStage 2 Metrics:")
pd.DataFrame([eval_stage2_result.metrics["overall"]])



EVALUANDO STAGE 2

[INFO] Evaluating 258 images...


Collecting detections: 100%|██████████| 258/258 [09:31<00:00,  2.21s/it]



HerdNet Evaluation Summary
Precision: 0.0063
Recall:    0.1975
F1 Score:  0.0122
MAE:       273.6202
RMSE:      802.7024

Detections: outputs/herdnet/eval_stage2/detections.csv
Metrics:    outputs/herdnet/eval_stage2/metrics.json


Stage 2 Metrics:


Unnamed: 0,precision,recall,f1_score,mae,rmse,mse,accuracy
0,0.006314,0.197477,0.012238,273.620155,802.702393,644331.131783,0.453744


In [27]:
comparison = pd.DataFrame([
    {"Stage": "Stage 1", **eval_stage1_result.metrics["overall"]},
    {"Stage": "Stage 2", **eval_stage2_result.metrics["overall"]},
])

print("\n" + "="*70)
print("COMPARACIÓN STAGE 1 vs STAGE 2")
print("="*70 + "\n")

comparison



COMPARACIÓN STAGE 1 vs STAGE 2



Unnamed: 0,Stage,precision,recall,f1_score,mae,rmse,mse,accuracy
0,Stage 1,0.002124,0.197042,0.004202,817.895349,1848.299905,3416213.0,0.430464
1,Stage 2,0.006314,0.197477,0.012238,273.620155,802.702393,644331.1,0.453744


## Métricas por clase (Stage 2)


In [28]:
per_class_df = pd.DataFrame(eval_stage2_result.metrics["per_class"]).T

print("\nMétricas por clase (Stage 2):")
per_class_df



Métricas por clase (Stage 2):


Unnamed: 0,precision,recall,f1_score,mae,rmse
Hartebeest,0.009597,0.072593,0.016952,28.103659,40.16665
Buffalo,0.000931,0.031519,0.00181,71.78882,238.41464
Kob,0.029412,0.224319,0.052005,20.253086,60.203205
Warthog,0.001034,0.094595,0.002046,39.757396,84.11112
Waterbuck,0.0,0.0,0.0,40.792857,133.338051
Elephant,0.000824,0.046512,0.001619,286.647059,792.328697


# Resumen Final


In [29]:
print("\n" + "="*70)
print("PIPELINE COMPLETO FINALIZADO")
print("="*70)

print("\n Checkpoints:")
print(f"  Stage 1: {stage1_result.best_checkpoint}")
print(f"  Stage 2: {stage2_result.best_checkpoint}")

print("\n Detecciones:")
print(f"  Stage 1: {eval_stage1_result.detections_csv}")
print(f"  Stage 2: {eval_stage2_result.detections_csv}")

print("\n Métricas:")
print(f"  Stage 1 F1: {eval_stage1_result.metrics['overall']['f1_score']:.4f}")
print(f"  Stage 2 F1: {eval_stage2_result.metrics['overall']['f1_score']:.4f}")
print(f"  Mejora: {(eval_stage2_result.metrics['overall']['f1_score'] - eval_stage1_result.metrics['overall']['f1_score']):.4f}")

print("\n" + "="*70 + "\n")



PIPELINE COMPLETO FINALIZADO

 Checkpoints:
  Stage 1: outputs/herdnet/stage1/best_model.pth
  Stage 2: outputs/herdnet/stage2/best_model.pth

 Detecciones:
  Stage 1: outputs/herdnet/eval_stage1/detections.csv
  Stage 2: outputs/herdnet/eval_stage2/detections.csv

 Métricas:
  Stage 1 F1: 0.0042
  Stage 2 F1: 0.0122
  Mejora: 0.0080


