# Baseline OVD - BDD100K Detection Pipeline

Pipeline completo para establecer el baseline de detección con Grounding-DINO en BDD100K.

**Fases:**
1. Configuración del modelo y dataset
2. Inferencia sobre val_eval
3. Evaluación de métricas
4. Preparación para calibración
5. Generación de artefactos

**Hardware requerido:** GPU con CUDA (mínimo 8GB VRAM)

## 1. Imports y Setup

In [2]:
#pip install seaborn

Collecting seaborn
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Using cached seaborn-0.13.2-py3-none-any.whl (294 kB)
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2
Note: you may need to restart the kernel to use updated packages.


In [3]:
#!pip install torch torchvision
#!pip install pycocotools
#!pip install pandas matplotlib seaborn
#!pip install Pillow tqdm pyyaml

Collecting torch
  Using cached torch-2.9.0-cp312-cp312-win_amd64.whl.metadata (30 kB)
Collecting torchvision
  Using cached torchvision-0.24.0-cp312-cp312-win_amd64.whl.metadata (5.9 kB)
Collecting sympy>=1.13.3 (from torch)
  Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Using cached torch-2.9.0-cp312-cp312-win_amd64.whl (109.3 MB)
Using cached torchvision-0.24.0-cp312-cp312-win_amd64.whl (4.3 MB)
Using cached sympy-1.14.0-py3-none-any.whl (6.3 MB)
Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, sympy, torch, torchvision

   ---------------------------------------- 0/4 [mpmath]
   ---------------------------------------- 0/4 [mpmath]
   ---------------------------------------- 0/4 [mpmath]
   ---------------------------------------- 0/4 [mpmath]
   ---------------------------------------- 0/4 [mpmath]
   ----

In [None]:
print("All required packages are installed.")
#pip install ipykernel

In [None]:
#%pip install pandas

Collecting pandas
  Downloading pandas-2.3.3-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.3.3-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m347.8/347.8 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: tzdata, pandas
Successfully installed pandas-2.3.3 tzdata-2025.2
[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
import os
import sys
import json
import yaml
import time
import torch
import numpy as np
import pandas as pd
from pathlib import Path
from PIL import Image
from tqdm import tqdm
from collections import defaultdict
import matplotlib.pyplot as plt
from datetime import datetime
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

In [3]:
import groundingdino, os
print("GroundingDINO en:", os.path.dirname(groundingdino.__file__))

GroundingDINO en: /opt/conda/lib/python3.10/site-packages/groundingdino-0.1.0-py3.10-linux-x86_64.egg/groundingdino


In [5]:
for i, p in enumerate(sys.path):
    print(i, p)

0 /opt/conda/lib/python310.zip
1 /opt/conda/lib/python3.10
2 /opt/conda/lib/python3.10/lib-dynload
3 
4 /opt/conda/lib/python3.10/site-packages
5 /opt/conda/lib/python3.10/site-packages/groundingdino-0.1.0-py3.10-linux-x86_64.egg
6 /opt/program/GroundingDINO


In [4]:
#sys.path.append('../installing_dino/GroundingDINO')
from groundingdino.util.inference import load_model, load_image, predict
from groundingdino.util import box_ops

torch.manual_seed(42)
np.random.seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False





In [5]:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")

Device: cuda
PyTorch: 2.3.1+cu121
CUDA available: True
GPU: NVIDIA GeForce RTX 4060 Laptop GPU
CUDA version: 12.1


## 2. Configuración Baseline (1.1, 1.2)

In [6]:
BASE_DIR = Path('../data')
OUTPUT_DIR = Path('./outputs/baseline')
QUALITATIVE_DIR = Path('./outputs/qualitative/baseline')
CONFIG_DIR = Path('./configs')

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
QUALITATIVE_DIR.mkdir(parents=True, exist_ok=True)
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
    
baseline_config = {
    'model': {
        'name': 'Grounding-DINO',
        'checkpoint': '/opt/program/GroundingDINO/weights/groundingdino_swint_ogc.pth',
        'config': '/opt/program/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py',
        'architecture': 'SwinT-OGC',
        'input_size': [800, 1333],
        'device': str(device)
    },
    'dataset': {
        'image_dir': str(BASE_DIR / 'bdd100k/bdd100k/bdd100k/images/100k/val'),
        'val_eval_json': str(BASE_DIR / 'bdd100k_coco/val_eval.json'),
        'val_calib_json': str(BASE_DIR / 'bdd100k_coco/val_calib.json')
    },
    'inference': {
        'conf_threshold': 0.30,
        'nms_iou': 0.65,
        'batch_size': 1,
        'max_detections': 300
    },
    'prompts_file': str(BASE_DIR / 'prompts/bdd100k.txt'),
    'seed': 42,
    'timestamp': datetime.now().isoformat()
}

with open(CONFIG_DIR / 'baseline.yaml', 'w') as f:
    yaml.dump(baseline_config, f, default_flow_style=False)

print("Configuración baseline:")
print(yaml.dump(baseline_config, default_flow_style=False))

Configuración baseline:
dataset:
  image_dir: ../data/bdd100k/bdd100k/bdd100k/images/100k/val
  val_calib_json: ../data/bdd100k_coco/val_calib.json
  val_eval_json: ../data/bdd100k_coco/val_eval.json
inference:
  batch_size: 1
  conf_threshold: 0.3
  max_detections: 300
  nms_iou: 0.65
model:
  architecture: SwinT-OGC
  checkpoint: /opt/program/GroundingDINO/weights/groundingdino_swint_ogc.pth
  config: /opt/program/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py
  device: cuda
  input_size:
  - 800
  - 1333
  name: Grounding-DINO
prompts_file: ../data/prompts/bdd100k.txt
seed: 42
timestamp: '2025-11-10T22:42:10.653385'



## 3. Definición del Vocabulario (1.3, 1.4)

In [7]:
BDD_COCO_CATEGORIES = {
    1: 'person',
    2: 'rider',
    3: 'car',
    4: 'truck',
    5: 'bus',
    6: 'train',
    7: 'motorcycle',
    8: 'bicycle',
    9: 'traffic light',
    10: 'traffic sign'
}

PROMPTS = [
    'person',
    'rider',
    'car',
    'truck',
    'bus',
    'train',
    'motorcycle',
    'bicycle',
    'traffic light',
    'traffic sign'
]

PROMPT_SYNONYMS = {
    'bike': 'bicycle',
    'motorbike': 'motorcycle',
    'motor': 'motorcycle',
    'stop sign': 'traffic sign',
    'red light': 'traffic light',
    'signal': 'traffic light',
    'pedestrian': 'person',
    'vehicle': 'car',
    'bicyclist': 'rider'
}

CAT_ID_TO_PROMPT = {cat_id: name for cat_id, name in BDD_COCO_CATEGORIES.items()}
PROMPT_TO_CAT_ID = {name: cat_id for cat_id, name in BDD_COCO_CATEGORIES.items()}
PROMPT_IDX_TO_CAT_ID = {i: i+1 for i in range(len(PROMPTS))}

PROMPTS_DIR = BASE_DIR / 'prompts'
PROMPTS_DIR.mkdir(parents=True, exist_ok=True)

with open(PROMPTS_DIR / 'bdd100k.txt', 'w') as f:
    for prompt in PROMPTS:
        f.write(f"{prompt}\n")

TEXT_PROMPT = '. '.join(PROMPTS) + '.'

print(f"Clases BDD100K COCO: {len(BDD_COCO_CATEGORIES)}")
print(f"Text prompt para Grounding-DINO:\n{TEXT_PROMPT}")
print(f"\nMapeo guardado en: {PROMPTS_DIR / 'bdd100k.txt'}")

Clases BDD100K COCO: 10
Text prompt para Grounding-DINO:
person. rider. car. truck. bus. train. motorcycle. bicycle. traffic light. traffic sign.

Mapeo guardado en: ../data/prompts/bdd100k.txt


## 4. Carga del Modelo Grounding-DINO

In [8]:
print("Cargando Grounding-DINO...")
model = load_model(baseline_config['model']['config'],baseline_config['model']['checkpoint'])
model.to(device)
model.eval()
print("Modelo cargado exitosamente")

Cargando Grounding-DINO...




final text_encoder_type: bert-base-uncased




Modelo cargado exitosamente


In [11]:
for name, param in model.named_parameters():
    print(f"Primer parámetro ({name}) en:", param.device)
    break

Primer parámetro (transformer.level_embed) en: cuda:0


In [12]:
if next(model.parameters()).is_cuda:
    print("✅ El modelo está en GPU (CUDA)")
else:
    print("⚠️ El modelo está en CPU")

✅ El modelo está en GPU (CUDA)


## 5. Funciones de Post-procesamiento (2.2)

In [9]:
def normalize_label(label):
    label_lower = label.lower().strip()
    if label_lower in PROMPT_SYNONYMS:
        return PROMPT_SYNONYMS[label_lower]
    for canonical in PROMPTS:
        if canonical in label_lower:
            return canonical
    return label_lower

def cxcywh_to_xywh(bbox):
    cx, cy, w, h = bbox
    x = cx - w / 2
    y = cy - h / 2
    return [x, y, w, h]

def clip_bbox(bbox, img_w, img_h):
    x, y, w, h = bbox
    x = max(0, min(x, img_w))
    y = max(0, min(y, img_h))
    w = max(0, min(w, img_w - x))
    h = max(0, min(h, img_h - y))
    return [x, y, w, h]

def apply_nms(boxes, scores, labels, iou_threshold=0.65):
    if len(boxes) == 0:
        return boxes, scores, labels
    
    keep_indices = []
    boxes_tensor = torch.tensor(boxes)
    scores_tensor = torch.tensor(scores)
    labels_tensor = torch.tensor(labels)
    
    for label_id in torch.unique(labels_tensor):
        mask = labels_tensor == label_id
        if mask.sum() == 0:
            continue
        
        class_boxes = boxes_tensor[mask]
        class_scores = scores_tensor[mask]
        class_indices = torch.where(mask)[0]
        
        x1 = class_boxes[:, 0]
        y1 = class_boxes[:, 1]
        x2 = class_boxes[:, 0] + class_boxes[:, 2]
        y2 = class_boxes[:, 1] + class_boxes[:, 3]
        areas = class_boxes[:, 2] * class_boxes[:, 3]
        
        order = class_scores.argsort(descending=True)
        
        keep = []
        while order.numel() > 0:
            if order.numel() == 1:
                keep.append(order.item())
                break
            
            i = order[0].item()
            keep.append(i)
            
            xx1 = torch.maximum(x1[i], x1[order[1:]])
            yy1 = torch.maximum(y1[i], y1[order[1:]])
            xx2 = torch.minimum(x2[i], x2[order[1:]])
            yy2 = torch.minimum(y2[i], y2[order[1:]])
            
            w = torch.maximum(torch.tensor(0.0), xx2 - xx1)
            h = torch.maximum(torch.tensor(0.0), yy2 - yy1)
            inter = w * h
            
            iou = inter / (areas[i] + areas[order[1:]] - inter)
            
            mask_keep = iou <= iou_threshold
            order = order[1:][mask_keep]
        
        keep_indices.extend(class_indices[keep].tolist())
    
    keep_indices = sorted(keep_indices)
    return (
        [boxes[i] for i in keep_indices],
        [scores[i] for i in keep_indices],
        [labels[i] for i in keep_indices]
    )

print("Funciones de post-procesamiento definidas")

Funciones de post-procesamiento definidas


## 6. Inferencia sobre val_eval (2.1, 2.3, 2.4)

In [14]:
coco_val = COCO(baseline_config['dataset']['val_eval_json'])
image_ids = sorted(coco_val.getImgIds())

predictions = []
perf_metrics = {
    'times': [],
    'num_detections': [],
    'gpu_memory_mb': []
}

print(f"Iniciando inferencia sobre {len(image_ids)} imágenes de val_eval...")

sample_images = []
sample_interval = len(image_ids) // 50 if len(image_ids) > 50 else 1

for idx, img_id in enumerate(tqdm(image_ids)):
    img_info = coco_val.loadImgs(img_id)[0]
    img_path = Path(baseline_config['dataset']['image_dir']) / img_info['file_name']
    
    if not img_path.exists():
        continue
    
    image_pil = Image.open(img_path).convert('RGB')
    img_w, img_h = image_pil.size
    
    image_source, image_transformed = load_image(str(img_path))
    
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    start_time = time.time()
    
    with torch.no_grad():
        boxes, logits, phrases = predict(
            model=model,
            image=image_transformed,
            caption=TEXT_PROMPT,
            box_threshold=baseline_config['inference']['conf_threshold'],
            text_threshold=0.25,
            device=device
        )
    
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    elapsed = time.time() - start_time
    perf_metrics['times'].append(elapsed)
    
    if torch.cuda.is_available():
        perf_metrics['gpu_memory_mb'].append(torch.cuda.max_memory_allocated() / 1024 / 1024)
    
    boxes_xyxy = box_ops.box_cxcywh_to_xyxy(boxes) * torch.tensor([img_w, img_h, img_w, img_h])
    boxes_xywh = []
    for box in boxes_xyxy:
        x1, y1, x2, y2 = box.tolist()
        boxes_xywh.append([x1, y1, x2 - x1, y2 - y1])
    
    scores = logits.tolist()
    labels_raw = [normalize_label(p) for p in phrases]
    category_ids = [PROMPT_TO_CAT_ID.get(lbl, -1) for lbl in labels_raw]
    
    valid_mask = [cid != -1 for cid in category_ids]
    boxes_xywh = [b for b, m in zip(boxes_xywh, valid_mask) if m]
    scores = [s for s, m in zip(scores, valid_mask) if m]
    category_ids = [c for c, m in zip(category_ids, valid_mask) if m]
    
    if len(boxes_xywh) > 0:
        boxes_xywh, scores, category_ids = apply_nms(
            boxes_xywh, scores, category_ids, 
            iou_threshold=baseline_config['inference']['nms_iou']
        )
    
    if len(boxes_xywh) > baseline_config['inference']['max_detections']:
        sorted_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
        sorted_indices = sorted_indices[:baseline_config['inference']['max_detections']]
        boxes_xywh = [boxes_xywh[i] for i in sorted_indices]
        scores = [scores[i] for i in sorted_indices]
        category_ids = [category_ids[i] for i in sorted_indices]
    
    for box, score, cat_id in zip(boxes_xywh, scores, category_ids):
        box_clipped = clip_bbox(box, img_w, img_h)
        predictions.append({
            'image_id': int(img_id),
            'category_id': int(cat_id),
            'bbox': [float(b) for b in box_clipped],
            'score': float(score)
        })
    
    perf_metrics['num_detections'].append(len(boxes_xywh))
    
    if idx % sample_interval == 0 and len(sample_images) < 50:
        sample_images.append({
            'image_id': img_id,
            'image_path': str(img_path),
            'num_dets': len(boxes_xywh)
        })

preds_path = OUTPUT_DIR / 'preds_raw.json'
with open(preds_path, 'w') as f:
    json.dump(predictions, f)

print(f"\nInferencia completada!")
print(f"Total predicciones: {len(predictions)}")
print(f"Predicciones guardadas en: {preds_path}")
print(f"Tiempo promedio por imagen: {np.mean(perf_metrics['times']):.3f}s")
print(f"FPS: {1.0 / np.mean(perf_metrics['times']):.2f}")
print(f"Detecciones promedio por imagen: {np.mean(perf_metrics['num_detections']):.1f}")

loading annotations into memory...
Done (t=0.18s)
creating index...
index created!
Iniciando inferencia sobre 2000 imágenes de val_eval...


100%|██████████| 2000/2000 [12:05<00:00,  2.76it/s]



Inferencia completada!
Total predicciones: 22162
Predicciones guardadas en: outputs/baseline/preds_raw.json
Tiempo promedio por imagen: 0.275s
FPS: 3.64
Detecciones promedio por imagen: 11.1


## 7. Guardado de Métricas de Rendimiento (2.4)

In [11]:
perf_summary = {
    'avg_time_per_image_s': float(np.mean(perf_metrics['times'])),
    'std_time_per_image_s': float(np.std(perf_metrics['times'])),
    'fps': float(1.0 / np.mean(perf_metrics['times'])),
    'avg_detections_per_image': float(np.mean(perf_metrics['num_detections'])),
    'std_detections_per_image': float(np.std(perf_metrics['num_detections'])),
    'total_predictions': len(predictions),
    'total_images': len(image_ids)
}

if torch.cuda.is_available():
    perf_summary['avg_gpu_memory_mb'] = float(np.mean(perf_metrics['gpu_memory_mb']))
    perf_summary['peak_gpu_memory_mb'] = float(np.max(perf_metrics['gpu_memory_mb']))

with open(OUTPUT_DIR / 'perf.txt', 'w') as f:
    f.write("=" * 60 + "\n")
    f.write("BASELINE PERFORMANCE METRICS\n")
    f.write("=" * 60 + "\n\n")
    for key, value in perf_summary.items():
        f.write(f"{key}: {value}\n")
    f.write("\n" + "=" * 60 + "\n")

print("\nMétricas de rendimiento guardadas en:", OUTPUT_DIR / 'perf.txt')

NameError: name 'perf_metrics' is not defined

## 8. Evaluación COCO (3.1)

In [13]:
print("Ejecutando evaluación COCO...")

# Cargar predicciones desde archivo si no están en memoria
if 'predictions' not in locals() or 'preds_path' not in locals():
    print("⚠️ Cargando predicciones desde archivo...")
    preds_path = OUTPUT_DIR / 'preds_raw.json'
    if not preds_path.exists():
        raise FileNotFoundError(f"No se encontró el archivo de predicciones: {preds_path}\n"
                                "Ejecuta primero la Sección 6 (Inferencia)")
    
    with open(preds_path, 'r') as f:
        predictions = json.load(f)
    print(f"✓ Predicciones cargadas: {len(predictions)} detecciones")

# Cargar COCO val si no está en memoria
if 'coco_val' not in locals():
    print("⚠️ Cargando anotaciones COCO...")
    coco_val = COCO(baseline_config['dataset']['val_eval_json'])
    print("✓ Anotaciones COCO cargadas")

# Evaluación COCO global
coco_dt = coco_val.loadRes(str(preds_path))
coco_eval = COCOeval(coco_val, coco_dt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()

metrics_coco = {
    'mAP': float(coco_eval.stats[0]),
    'AP50': float(coco_eval.stats[1]),
    'AP75': float(coco_eval.stats[2]),
    'AP_small': float(coco_eval.stats[3]),
    'AP_medium': float(coco_eval.stats[4]),
    'AP_large': float(coco_eval.stats[5]),
    'AR_max1': float(coco_eval.stats[6]),
    'AR_max10': float(coco_eval.stats[7]),
    'AR_max100': float(coco_eval.stats[8]),
    'AR_small': float(coco_eval.stats[9]),
    'AR_medium': float(coco_eval.stats[10]),
    'AR_large': float(coco_eval.stats[11])
}

print("\nEvaluando métricas por clase...")
per_class_metrics = {}
for cat_id, cat_name in tqdm(BDD_COCO_CATEGORIES.items(), desc="Clases"):
    coco_eval_class = COCOeval(coco_val, coco_dt, 'bbox')
    coco_eval_class.params.catIds = [cat_id]
    
    try:
        coco_eval_class.evaluate()
        coco_eval_class.accumulate()
        
        # Verificar que stats existe y tiene datos
        if hasattr(coco_eval_class, 'stats') and coco_eval_class.stats is not None and len(coco_eval_class.stats) > 0:
            per_class_metrics[cat_name] = {
                'mAP': float(coco_eval_class.stats[0]) if coco_eval_class.stats[0] >= 0 else 0.0,
                'AP50': float(coco_eval_class.stats[1]) if coco_eval_class.stats[1] >= 0 else 0.0,
                'AP75': float(coco_eval_class.stats[2]) if coco_eval_class.stats[2] >= 0 else 0.0
            }
        else:
            # Si no hay stats, la clase no tiene predicciones
            per_class_metrics[cat_name] = {
                'mAP': 0.0,
                'AP50': 0.0,
                'AP75': 0.0
            }
            print(f"  ⚠️ {cat_name}: Sin predicciones")
    
    except Exception as e:
        print(f"  ⚠️ Error evaluando {cat_name}: {e}")
        per_class_metrics[cat_name] = {
            'mAP': 0.0,
            'AP50': 0.0,
            'AP75': 0.0
        }

metrics_coco['per_class'] = per_class_metrics

with open(OUTPUT_DIR / 'metrics.json', 'w') as f:
    json.dump(metrics_coco, f, indent=2)

print("\n" + "="*60)
print("MÉTRICAS GLOBALES")
print("="*60)
print(f"  mAP@[.50:.95]: {metrics_coco['mAP']:.4f}")
print(f"  AP@50: {metrics_coco['AP50']:.4f}")
print(f"  AP@75: {metrics_coco['AP75']:.4f}")

print("\n" + "="*60)
print("MÉTRICAS POR CLASE")
print("="*60)
for cat_name, metrics in per_class_metrics.items():
    print(f"  {cat_name:15s}: mAP={metrics['mAP']:.3f}  AP50={metrics['AP50']:.3f}  AP75={metrics['AP75']:.3f}")

print(f"\n✓ Métricas guardadas en: {OUTPUT_DIR / 'metrics.json'}")





Ejecutando evaluación COCO...
Loading and preparing results...
DONE (t=0.25s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=3.63s).
Accumulating evaluation results...
DONE (t=0.73s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.188
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.284
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.285
 Average Recall     (AR) @[ IoU=0.50:0

Clases:   0%|          | 0/10 [00:00<?, ?it/s]

Running per image evaluation...
Evaluate annotation type *bbox*


Clases:  10%|█         | 1/10 [00:00<00:03,  2.49it/s]

DONE (t=0.33s).
Accumulating evaluation results...
DONE (t=0.06s).
  ⚠️ person: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.06s).
Accumulating evaluation results...
DONE (t=0.03s).
  ⚠️ rider: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*


Clases:  30%|███       | 3/10 [00:02<00:05,  1.33it/s]

DONE (t=1.49s).
Accumulating evaluation results...
DONE (t=0.16s).
  ⚠️ car: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*


Clases:  50%|█████     | 5/10 [00:02<00:02,  2.39it/s]

DONE (t=0.19s).
Accumulating evaluation results...
DONE (t=0.05s).
  ⚠️ truck: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.07s).
Accumulating evaluation results...
DONE (t=0.02s).
  ⚠️ bus: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.02s).
Accumulating evaluation results...
DONE (t=0.01s).
  ⚠️ train: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*


Clases:  70%|███████   | 7/10 [00:02<00:00,  4.14it/s]

DONE (t=0.06s).
Accumulating evaluation results...
DONE (t=0.02s).
  ⚠️ motorcycle: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.05s).
Accumulating evaluation results...
DONE (t=0.01s).
  ⚠️ bicycle: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*


Clases:  90%|█████████ | 9/10 [00:03<00:00,  3.47it/s]

DONE (t=0.55s).
Accumulating evaluation results...
DONE (t=0.10s).
  ⚠️ traffic light: Sin predicciones
Running per image evaluation...
Evaluate annotation type *bbox*


Clases: 100%|██████████| 10/10 [00:03<00:00,  2.56it/s]

DONE (t=0.50s).
Accumulating evaluation results...
DONE (t=0.09s).
  ⚠️ traffic sign: Sin predicciones

MÉTRICAS GLOBALES
  mAP@[.50:.95]: 0.1705
  AP@50: 0.2785
  AP@75: 0.1705

MÉTRICAS POR CLASE
  person         : mAP=0.000  AP50=0.000  AP75=0.000
  rider          : mAP=0.000  AP50=0.000  AP75=0.000
  car            : mAP=0.000  AP50=0.000  AP75=0.000
  truck          : mAP=0.000  AP50=0.000  AP75=0.000
  bus            : mAP=0.000  AP50=0.000  AP75=0.000
  train          : mAP=0.000  AP50=0.000  AP75=0.000
  motorcycle     : mAP=0.000  AP50=0.000  AP75=0.000
  bicycle        : mAP=0.000  AP50=0.000  AP75=0.000
  traffic light  : mAP=0.000  AP50=0.000  AP75=0.000
  traffic sign   : mAP=0.000  AP50=0.000  AP75=0.000

✓ Métricas guardadas en: outputs/baseline/metrics.json





In [14]:
# Diagnóstico rápido
print("Distribución de predicciones por clase:")
for cat_id, cat_name in BDD_COCO_CATEGORIES.items():
    count = sum(1 for p in predictions if p['category_id'] == cat_id)
    print(f"  {cat_name:15s}: {count:5d} predicciones")


Distribución de predicciones por clase:
  person         :  2436 predicciones
  rider          :   339 predicciones
  car            :  8343 predicciones
  truck          :  1412 predicciones
  bus            :   437 predicciones
  train          :    76 predicciones
  motorcycle     :   594 predicciones
  bicycle        :   169 predicciones
  traffic light  :  5238 predicciones
  traffic sign   :  3118 predicciones


## 9. Curvas Precision-Recall (3.1)

In [16]:
PR_DIR = OUTPUT_DIR / 'pr_curves'
PR_DIR.mkdir(exist_ok=True)

# Cargar datos si no están en memoria
if 'coco_val' not in locals() or 'coco_dt' not in locals():
    print("⚠️ Cargando datos necesarios...")
    preds_path = OUTPUT_DIR / 'preds_raw.json'
    coco_val = COCO(baseline_config['dataset']['val_eval_json'])
    coco_dt = coco_val.loadRes(str(preds_path))
    print("✓ Datos cargados")

print("Generando curvas Precision-Recall...")
for cat_id, cat_name in tqdm(BDD_COCO_CATEGORIES.items(), desc="Curvas PR"):
    coco_eval_class = COCOeval(coco_val, coco_dt, 'bbox')
    coco_eval_class.params.catIds = [cat_id]
    
    try:
        coco_eval_class.evaluate()
        coco_eval_class.accumulate()
        
        # Verificar que eval existe y tiene datos
        if not hasattr(coco_eval_class, 'eval') or coco_eval_class.eval is None:
            print(f"  ⚠️ {cat_name}: Sin datos de evaluación")
            continue
        
        if 'precision' not in coco_eval_class.eval:
            print(f"  ⚠️ {cat_name}: Sin datos de precisión")
            continue
        
        # Cuando se evalúa una sola clase, usar índice 0 en vez de cat_id-1
        # Formato: [T, R, K, A, M] donde K es el índice de clase
        # T=thresholds IoU, R=recall points, K=class, A=area, M=max_dets
        precision_array = coco_eval_class.eval['precision']
        
        # Para una sola clase, K=0 (no cat_id-1)
        # [0, :, 0, 0, 2] = IoU=0.5:0.95, todos los recalls, clase 0, área=all, maxDets=100
        if precision_array.shape[2] == 1:
            # Evaluación de una sola clase
            precision = precision_array[0, :, 0, 0, 2]
        else:
            # Evaluación multi-clase (no debería llegar aquí)
            precision = precision_array[0, :, cat_id-1, 0, 2]
        
        recall = np.linspace(0, 1, 101)
        
        # Filtrar valores válidos
        valid_mask = precision > -1
        precision_valid = precision[valid_mask]
        recall_valid = recall[valid_mask]
        
        if len(precision_valid) > 0:
            plt.figure(figsize=(8, 6))
            plt.plot(recall_valid, precision_valid, linewidth=2, color='#2E86AB')
            plt.xlabel('Recall', fontsize=12, fontweight='bold')
            plt.ylabel('Precision', fontsize=12, fontweight='bold')
            plt.title(f'Precision-Recall Curve: {cat_name}', fontsize=14, fontweight='bold')
            plt.grid(True, alpha=0.3, linestyle='--')
            plt.xlim([0, 1])
            plt.ylim([0, 1])
            
            # Calcular área bajo la curva (AP)
            ap = np.trapz(precision_valid, recall_valid) if len(precision_valid) > 1 else 0.0
            plt.text(0.6, 0.05, f'AP = {ap:.3f}', fontsize=11, 
                    bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
            
            plt.tight_layout()
            plt.savefig(PR_DIR / f'{cat_name.replace(" ", "_")}_pr.png', dpi=150, bbox_inches='tight')
            plt.close()
        else:
            print(f"  ⚠️ {cat_name}: Sin puntos válidos de precisión")
    
    except Exception as e:
        print(f"  ⚠️ Error generando curva PR para {cat_name}: {e}")
        continue

print(f"\n✓ Curvas PR guardadas en: {PR_DIR}")

# Generar resumen de curvas
summary_lines = ["RESUMEN DE CURVAS PRECISION-RECALL\n", "="*50 + "\n\n"]
for cat_name in BDD_COCO_CATEGORIES.values():
    pr_file = PR_DIR / f'{cat_name.replace(" ", "_")}_pr.png'
    status = "✓" if pr_file.exists() else "✗"
    summary_lines.append(f"{status} {cat_name:15s}: {pr_file.name}\n")

with open(PR_DIR / 'README.txt', 'w') as f:
    f.writelines(summary_lines)

print(f"✓ Resumen guardado en: {PR_DIR / 'README.txt'}")


Generando curvas Precision-Recall...


Curvas PR:   0%|          | 0/10 [00:00<?, ?it/s]

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.39s).
Accumulating evaluation results...


Curvas PR:  10%|█         | 1/10 [00:00<00:07,  1.18it/s]

DONE (t=0.28s).
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.05s).
Accumulating evaluation results...


Curvas PR:  20%|██        | 2/10 [00:01<00:03,  2.11it/s]

DONE (t=0.02s).
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=1.40s).
Accumulating evaluation results...
DONE (t=0.15s).


Curvas PR:  30%|███       | 3/10 [00:02<00:07,  1.03s/it]

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.14s).
Accumulating evaluation results...
DONE (t=0.05s).


Curvas PR:  40%|████      | 4/10 [00:03<00:04,  1.32it/s]

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.08s).
Accumulating evaluation results...
DONE (t=0.02s).


Curvas PR:  60%|██████    | 6/10 [00:03<00:01,  2.26it/s]

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.03s).
Accumulating evaluation results...
DONE (t=0.01s).
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.05s).
Accumulating evaluation results...


Curvas PR:  80%|████████  | 8/10 [00:03<00:00,  3.44it/s]

DONE (t=0.01s).
  ⚠️ motorcycle: Sin puntos válidos de precisión
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.04s).
Accumulating evaluation results...
DONE (t=0.01s).
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.54s).
Accumulating evaluation results...
DONE (t=0.09s).


Curvas PR:  90%|█████████ | 9/10 [00:04<00:00,  2.39it/s]

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.46s).
Accumulating evaluation results...
DONE (t=0.08s).


Curvas PR: 100%|██████████| 10/10 [00:05<00:00,  1.90it/s]


✓ Curvas PR guardadas en: outputs/baseline/pr_curves
✓ Resumen guardado en: outputs/baseline/pr_curves/README.txt





## 10. Sensibilidad a Umbrales (3.2)

In [17]:
threshold_sweep = []
conf_thresholds = [0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.50, 0.60, 0.75]

print("Barrido de umbrales de confianza...")
for conf_th in tqdm(conf_thresholds):
    preds_filtered = [p for p in predictions if p['score'] >= conf_th]
    
    if len(preds_filtered) == 0:
        continue
    
    temp_path = OUTPUT_DIR / f'temp_preds_{conf_th}.json'
    with open(temp_path, 'w') as f:
        json.dump(preds_filtered, f)
    
    coco_dt_temp = coco_val.loadRes(str(temp_path))
    coco_eval_temp = COCOeval(coco_val, coco_dt_temp, 'bbox')
    coco_eval_temp.evaluate()
    coco_eval_temp.accumulate()
    coco_eval_temp.summarize()
    
    threshold_sweep.append({
        'conf_threshold': conf_th,
        'mAP': float(coco_eval_temp.stats[0]),
        'AP50': float(coco_eval_temp.stats[1]),
        'AP75': float(coco_eval_temp.stats[2]),
        'num_predictions': len(preds_filtered)
    })
    
    temp_path.unlink()

threshold_df = pd.DataFrame(threshold_sweep)

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(threshold_df['conf_threshold'], threshold_df['mAP'], 'o-', label='mAP')
plt.plot(threshold_df['conf_threshold'], threshold_df['AP50'], 's-', label='AP50')
plt.plot(threshold_df['conf_threshold'], threshold_df['AP75'], '^-', label='AP75')
plt.xlabel('Confidence Threshold')
plt.ylabel('Average Precision')
plt.title('AP vs Confidence Threshold')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(threshold_df['conf_threshold'], threshold_df['num_predictions'], 'o-')
plt.xlabel('Confidence Threshold')
plt.ylabel('Number of Predictions')
plt.title('Predictions vs Threshold')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(OUTPUT_DIR / 'threshold_sensitivity.png', dpi=150)
plt.close()

threshold_df.to_csv(OUTPUT_DIR / 'threshold_sweep.csv', index=False)
print(f"\nBarrido de umbrales guardado en: {OUTPUT_DIR / 'threshold_sweep.csv'}")

Barrido de umbrales de confianza...


  0%|          | 0/11 [00:00<?, ?it/s]

Loading and preparing results...
DONE (t=0.24s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=3.20s).
Accumulating evaluation results...


  9%|▉         | 1/11 [00:04<00:42,  4.25s/it]

DONE (t=0.50s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.188
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.284
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.285
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.540
Loading and preparing re

 18%|█▊        | 2/11 [00:08<00:37,  4.12s/it]

DONE (t=0.69s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.188
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.284
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.285
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.540
Loading and preparing re

 27%|██▋       | 3/11 [00:12<00:32,  4.07s/it]

DONE (t=0.51s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.188
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.284
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.285
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.540
Loading and preparing re

 36%|███▋      | 4/11 [00:16<00:28,  4.02s/it]

DONE (t=0.50s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.188
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.284
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.285
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.540
Loading and preparing re

 45%|████▌     | 5/11 [00:20<00:24,  4.02s/it]

DONE (t=0.49s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.188
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.284
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.285
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.540
Loading and preparing re

 55%|█████▍    | 6/11 [00:24<00:20,  4.01s/it]

DONE (t=0.50s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.171
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.182
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.377
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.188
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.284
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.285
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.540
Loading and preparing re

 64%|██████▎   | 7/11 [00:27<00:15,  3.80s/it]

DONE (t=0.46s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.155
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.247
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.157
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.053
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.160
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.351
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.178
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.252
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.252
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.228
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.500
Loading and preparing re

 73%|███████▎  | 8/11 [00:30<00:10,  3.49s/it]

DONE (t=0.34s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.136
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.210
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.142
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.043
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.133
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.165
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.217
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.217
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.059
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.181
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.456
Loading and preparing re

 82%|████████▏ | 9/11 [00:32<00:05,  2.98s/it]

DONE (t=0.30s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.091
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.132
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.096
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.086
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.241
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.088
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.109
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.109
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.027
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.107
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.269
Loading and preparing re

 91%|█████████ | 10/11 [00:33<00:02,  2.46s/it]

DONE (t=0.22s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.057
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.076
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.061
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.043
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.153
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.056
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.062
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.062
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.049
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.166
Loading and preparing re

100%|██████████| 11/11 [00:34<00:00,  3.15s/it]

DONE (t=0.84s).
Accumulating evaluation results...
DONE (t=0.18s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.015
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.017
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.015
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.005
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.033
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.003
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= lar





Barrido de umbrales guardado en: outputs/baseline/threshold_sweep.csv


## 11. Visualización Cualitativa (3.4)

In [20]:
import matplotlib.patches as patches

# Cargar predicciones si no están en memoria
if 'predictions' not in locals():
    print("⚠️ Cargando predicciones desde archivo...")
    preds_path = OUTPUT_DIR / 'preds_raw.json'
    with open(preds_path, 'r') as f:
        predictions = json.load(f)
    print(f"✓ {len(predictions)} predicciones cargadas")

# Cargar COCO val si no está
if 'coco_val' not in locals():
    print("⚠️ Cargando anotaciones COCO...")
    coco_val = COCO(baseline_config['dataset']['val_eval_json'])
    print("✓ Anotaciones COCO cargadas")

# Generar sample_images si no existe
if 'sample_images' not in locals():
    print("⚠️ Seleccionando imágenes de muestra para visualización...")
    image_ids = sorted(coco_val.getImgIds())
    
    # Agrupar predicciones por imagen
    preds_per_image = {}
    for pred in predictions:
        img_id = pred['image_id']
        if img_id not in preds_per_image:
            preds_per_image[img_id] = []
        preds_per_image[img_id].append(pred)
    
    # Seleccionar muestra estratificada: imágenes con diferentes cantidades de detecciones
    sample_images = []
    
    # Ordenar imágenes por número de detecciones
    image_det_counts = [(img_id, len(preds_per_image.get(img_id, []))) 
                        for img_id in image_ids]
    image_det_counts.sort(key=lambda x: x[1], reverse=True)
    
    # Seleccionar 50 imágenes distribuidas
    sample_interval = max(1, len(image_det_counts) // 50)
    for i in range(0, len(image_det_counts), sample_interval):
        if len(sample_images) >= 50:
            break
        
        img_id, num_dets = image_det_counts[i]
        img_info = coco_val.loadImgs(img_id)[0]
        img_path = Path(baseline_config['dataset']['image_dir']) / img_info['file_name']
        
        if img_path.exists():
            sample_images.append({
                'image_id': img_id,
                'image_path': str(img_path),
                'num_dets': num_dets
            })
    
    print(f"✓ {len(sample_images)} imágenes seleccionadas para visualización")
    print(f"  Rango de detecciones: {min(s['num_dets'] for s in sample_images)} - {max(s['num_dets'] for s in sample_images)}")

# Colores para las clases
colors = plt.cm.tab10(np.linspace(0, 1, 10))
cat_colors = {cat_id: colors[i] for i, cat_id in enumerate(BDD_COCO_CATEGORIES.keys())}

print("\nGenerando visualizaciones cualitativas...")
viz_count = 0
for i, sample in enumerate(tqdm(sample_images[:50], desc="Visualizaciones")):
    img_id = sample['image_id']
    img_path = Path(sample['image_path'])
    
    if not img_path.exists():
        print(f"  ⚠️ Imagen no encontrada: {img_path}")
        continue
    
    try:
        image = Image.open(img_path).convert('RGB')
        img_w, img_h = image.size
        
        # Filtrar predicciones para esta imagen
        img_preds = [p for p in predictions if p['image_id'] == img_id]
        
        # Crear figura
        fig, ax = plt.subplots(1, 1, figsize=(16, 9))
        ax.imshow(image)
        
        # Dibujar bounding boxes
        for pred in img_preds:
            bbox = pred['bbox']
            cat_id = pred['category_id']
            score = pred['score']
            
            x, y, w, h = bbox
            
            # Dibujar rectángulo
            rect = patches.Rectangle(
                (x, y), w, h,
                linewidth=2.5,
                edgecolor=cat_colors[cat_id],
                facecolor='none'
            )
            ax.add_patch(rect)
            
            # Añadir etiqueta
            label = f"{CAT_ID_TO_PROMPT[cat_id]} {score:.2f}"
            ax.text(
                x, max(0, y - 5),
                label,
                fontsize=9,
                color='white',
                weight='bold',
                bbox=dict(facecolor=cat_colors[cat_id], alpha=0.8, 
                         pad=3, edgecolor='white', linewidth=1)
            )
        
        ax.axis('off')
        ax.set_title(
            f'Image ID: {img_id} | Detections: {len(img_preds)}',
            fontsize=14,
            fontweight='bold',
            pad=10
        )
        
        plt.tight_layout()
        output_path = QUALITATIVE_DIR / f'{img_id:07d}.jpg'
        plt.savefig(output_path, dpi=100, bbox_inches='tight', pad_inches=0.1)
        plt.close()
        
        viz_count += 1
    
    except Exception as e:
        print(f"  ⚠️ Error procesando imagen {img_id}: {e}")
        plt.close('all')
        continue

print(f"\n✓ {viz_count} visualizaciones guardadas en: {QUALITATIVE_DIR}")

# Generar índice HTML para visualización rápida
html_content = ['<!DOCTYPE html>\n<html>\n<head>\n']
html_content.append('  <meta charset="UTF-8">\n')
html_content.append('  <title>Baseline Qualitative Results</title>\n')
html_content.append('  <style>\n')
html_content.append('    body { font-family: Arial, sans-serif; margin: 20px; background: #f5f5f5; }\n')
html_content.append('    h1 { color: #333; }\n')
html_content.append('    .grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(400px, 1fr)); gap: 20px; }\n')
html_content.append('    .item { background: white; padding: 10px; border-radius: 8px; box-shadow: 0 2px 4px rgba(0,0,0,0.1); }\n')
html_content.append('    img { width: 100%; height: auto; border-radius: 4px; }\n')
html_content.append('    .caption { margin-top: 10px; font-size: 14px; color: #666; }\n')
html_content.append('  </style>\n')
html_content.append('</head>\n<body>\n')
html_content.append('  <h1>Baseline Detection Results (BDD100K val_eval)</h1>\n')
html_content.append(f'  <p>Total visualizations: {viz_count}</p>\n')
html_content.append('  <div class="grid">\n')

for sample in sample_images[:viz_count]:
    img_id = sample['image_id']
    num_dets = sample['num_dets']
    img_file = f'{img_id:07d}.jpg'
    
    if (QUALITATIVE_DIR / img_file).exists():
        html_content.append('    <div class="item">\n')
        html_content.append(f'      <img src="{img_file}" alt="Image {img_id}">\n')
        html_content.append(f'      <div class="caption">Image ID: {img_id} | Detections: {num_dets}</div>\n')
        html_content.append('    </div>\n')

html_content.append('  </div>\n')
html_content.append('</body>\n</html>')

with open(QUALITATIVE_DIR / 'index.html', 'w', encoding='utf-8') as f:
    f.writelines(html_content)

print(f"✓ Índice HTML generado: {QUALITATIVE_DIR / 'index.html'}")
print(f"  Abre este archivo en un navegador para ver todas las visualizaciones")


⚠️ Seleccionando imágenes de muestra para visualización...
✓ 50 imágenes seleccionadas para visualización
  Rango de detecciones: 2 - 37

Generando visualizaciones cualitativas...


Visualizaciones: 100%|██████████| 50/50 [00:13<00:00,  3.59it/s]


✓ 50 visualizaciones guardadas en: outputs/qualitative/baseline
✓ Índice HTML generado: outputs/qualitative/baseline/index.html
  Abre este archivo en un navegador para ver todas las visualizaciones





## 12. Preparación para Calibración (4.1, 4.2, 4.3)

In [None]:
if Path(baseline_config['dataset']['val_calib_json']).exists():
    print("Generando inputs para calibración sobre val_calib...")
    
    coco_calib = COCO(baseline_config['dataset']['val_calib_json'])
    calib_image_ids = sorted(coco_calib.getImgIds())
    
    calib_records = []
    
    for img_id in tqdm(calib_image_ids):
        img_info = coco_calib.loadImgs(img_id)[0]
        img_path = Path(baseline_config['dataset']['image_dir']) / img_info['file_name']
        
        if not img_path.exists():
            continue
        
        image_pil = Image.open(img_path).convert('RGB')
        img_w, img_h = image_pil.size
        
        image_source, image_transformed = load_image(str(img_path))
        
        with torch.no_grad():
            boxes, logits, phrases = predict(
                model=model,
                image=image_transformed,
                caption=TEXT_PROMPT,
                box_threshold=0.05,
                text_threshold=0.25,
                device=device
            )
        
        boxes_xyxy = box_ops.box_cxcywh_to_xyxy(boxes) * torch.tensor([img_w, img_h, img_w, img_h])
        boxes_xywh = []
        for box in boxes_xyxy:
            x1, y1, x2, y2 = box.tolist()
            boxes_xywh.append([x1, y1, x2 - x1, y2 - y1])
        
        scores = logits.tolist()
        labels_raw = [normalize_label(p) for p in phrases]
        category_ids = [PROMPT_TO_CAT_ID.get(lbl, -1) for lbl in labels_raw]
        
        valid_mask = [cid != -1 for cid in category_ids]
        boxes_xywh = [b for b, m in zip(boxes_xywh, valid_mask) if m]
        scores = [s for s, m in zip(scores, valid_mask) if m]
        category_ids = [c for c, m in zip(category_ids, valid_mask) if m]
        
        ann_ids = coco_calib.getAnnIds(imgIds=img_id)
        anns = coco_calib.loadAnns(ann_ids)
        
        for pred_box, pred_score, pred_cat in zip(boxes_xywh, scores, category_ids):
            pred_x1, pred_y1, pred_w, pred_h = pred_box
            pred_x2 = pred_x1 + pred_w
            pred_y2 = pred_y1 + pred_h
            pred_area = pred_w * pred_h
            
            best_iou = 0.0
            best_match = None
            
            for ann in anns:
                if ann['category_id'] != pred_cat:
                    continue
                
                gt_x, gt_y, gt_w, gt_h = ann['bbox']
                gt_x2 = gt_x + gt_w
                gt_y2 = gt_y + gt_h
                gt_area = gt_w * gt_h
                
                inter_x1 = max(pred_x1, gt_x)
                inter_y1 = max(pred_y1, gt_y)
                inter_x2 = min(pred_x2, gt_x2)
                inter_y2 = min(pred_y2, gt_y2)
                
                inter_w = max(0, inter_x2 - inter_x1)
                inter_h = max(0, inter_y2 - inter_y1)
                inter_area = inter_w * inter_h
                
                union_area = pred_area + gt_area - inter_area
                iou = inter_area / union_area if union_area > 0 else 0.0
                
                if iou > best_iou:
                    best_iou = iou
                    best_match = ann['id']
            
            is_correct = best_iou >= 0.5
            
            calib_records.append({
                'image_id': int(img_id),
                'bbox': [float(b) for b in pred_box],
                'category_id_pred': int(pred_cat),
                'score': float(pred_score),
                'iou': float(best_iou),
                'is_correct': bool(is_correct),
                'gt_ann_id': int(best_match) if best_match else -1
            })
    
    # Save as CSV instead of Parquet
    calib_df = pd.DataFrame(calib_records)
    calib_df.to_csv(OUTPUT_DIR / 'calib_inputs.csv', index=False)

    print(f"\nCalibration inputs generados: {len(calib_records)} detecciones")
    print(f"Guardado en: {OUTPUT_DIR / 'calib_inputs.csv'}")

    class_counts = calib_df.groupby('category_id_pred').size().to_dict()
    print("\nCobertura por clase en val_calib:")
    for cat_id, count in sorted(class_counts.items()):
        cat_name = CAT_ID_TO_PROMPT[cat_id]
        correct = calib_df[(calib_df['category_id_pred'] == cat_id) & (calib_df['is_correct'])].shape[0]
        print(f"  {cat_name}: {count} predicciones ({correct} correctas, {correct/count*100:.1f}%)")

else:
    print("val_calib.json no encontrado, saltando generación de inputs de calibración")

Generando inputs para calibración sobre val_calib...
loading annotations into memory...
Done (t=2.23s)
creating index...
index created!


100%|██████████| 8000/8000 [1:03:25<00:00,  2.10it/s]


ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.
Trying to import the above resulted in these errors:
 - Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
 - Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.

In [None]:
calib_df = pd.DataFrame(calib_records)
calib_df.to_csv(OUTPUT_DIR / 'calib_inputs.csv', index=False)

print(f"\nCalibration inputs generados: {len(calib_records)} detecciones")
print(f"Guardado en: {OUTPUT_DIR / 'calib_inputs.csv'}")

class_counts = calib_df.groupby('category_id_pred').size().to_dict()
print("\nCobertura por clase en val_calib:")
for cat_id, count in sorted(class_counts.items()):
    cat_name = CAT_ID_TO_PROMPT[cat_id]
    correct = calib_df[(calib_df['category_id_pred'] == cat_id) & (calib_df['is_correct'])].shape[0]
    print(f"  {cat_name}: {count} predicciones ({correct} correctas, {correct/count*100:.1f}%)")


Calibration inputs generados: 129795 detecciones
Guardado en: outputs/baseline/calib_inputs.csv

Cobertura por clase en val_calib:
  person: 15048 predicciones (6959 correctas, 46.2%)
  rider: 2877 predicciones (145 correctas, 5.0%)
  car: 46324 predicciones (39306 correctas, 84.9%)
  truck: 8245 predicciones (2498 correctas, 30.3%)
  bus: 3748 predicciones (598 correctas, 16.0%)
  train: 558 predicciones (1 correctas, 0.2%)
  motorcycle: 4306 predicciones (0 correctas, 0.0%)
  bicycle: 1642 predicciones (323 correctas, 19.7%)
  traffic light: 29496 predicciones (13247 correctas, 44.9%)
  traffic sign: 17551 predicciones (12340 correctas, 70.3%)


## 13. Tabla Resumen Baseline (5.2)

In [24]:
# Cargar threshold_df si no existe
if 'threshold_df' not in locals():
    print("⚠️ Cargando threshold sweep desde archivo...")
    threshold_csv = OUTPUT_DIR / 'threshold_sweep.csv'
    
    if threshold_csv.exists():
        threshold_df = pd.read_csv(threshold_csv)
        print(f"✓ Threshold sweep cargado: {len(threshold_df)} configuraciones")
    else:
        raise FileNotFoundError(
            f"No se encontró {threshold_csv}\n"
            "Ejecuta primero la Sección 10 (Sensibilidad a Umbrales)"
        )

# Cargar perf_summary si no existe
if 'perf_summary' not in locals():
    print("⚠️ Cargando métricas de rendimiento desde archivo...")
    perf_txt = OUTPUT_DIR / 'perf.txt'
    
    if perf_txt.exists():
        # Parsear perf.txt
        perf_summary = {}
        with open(perf_txt, 'r') as f:
            for line in f:
                if ':' in line and '=' not in line:
                    key, value = line.strip().split(':', 1)
                    try:
                        perf_summary[key.strip()] = float(value.strip())
                    except:
                        pass
        print(f"✓ Métricas de rendimiento cargadas: {len(perf_summary)} valores")
    else:
        # Valores por defecto si no existe el archivo
        print("⚠️ perf.txt no encontrado, usando valores por defecto")
        perf_summary = {
            'fps': 1.0,
            'peak_gpu_memory_mb': 0,
            'total_images': 2000
        }

# Cargar baseline_config si no existe
if 'baseline_config' not in locals():
    print("⚠️ Cargando configuración baseline...")
    config_yaml = CONFIG_DIR / 'baseline.yaml'
    
    if config_yaml.exists():
        with open(config_yaml, 'r') as f:
            baseline_config = yaml.safe_load(f)
        print("✓ Configuración baseline cargada")
    else:
        # Configuración por defecto
        baseline_config = {
            'inference': {
                'conf_threshold': 0.30,
                'nms_iou': 0.65
            }
        }

# Generar tabla resumen
summary_table = []

for _, row in threshold_df.iterrows():
    conf_th = row['conf_threshold']
    
    is_baseline = abs(conf_th - baseline_config['inference']['conf_threshold']) < 0.01
    marker = '⭐' if is_baseline else ''
    
    summary_table.append({
        'Config': marker,
        'Conf_Threshold': conf_th,
        'NMS_IoU': baseline_config['inference']['nms_iou'],
        'mAP': row['mAP'],
        'AP50': row['AP50'],
        'AP75': row['AP75'],
        'FPS': perf_summary.get('fps', 1.0),
        'GPU_MB': perf_summary.get('peak_gpu_memory_mb', 0),
        'Detections/Img': row['num_predictions'] / perf_summary.get('total_images', 2000)
    })

summary_df = pd.DataFrame(summary_table)

print("\n" + "="*80)
print("TABLA RESUMEN BASELINE")
print("="*80)
print(summary_df.to_string(index=False))
print("="*80)

summary_df.to_csv(OUTPUT_DIR / 'summary_table.csv', index=False)
print(f"\n✓ Tabla resumen guardada en: {OUTPUT_DIR / 'summary_table.csv'}")

# Generar visualización adicional
plt.figure(figsize=(14, 6))

# Subplot 1: Métricas vs Threshold
plt.subplot(1, 2, 1)
plt.plot(summary_df['Conf_Threshold'], summary_df['mAP'], 'o-', label='mAP', linewidth=2, markersize=8)
plt.plot(summary_df['Conf_Threshold'], summary_df['AP50'], 's-', label='AP50', linewidth=2, markersize=8)
plt.plot(summary_df['Conf_Threshold'], summary_df['AP75'], '^-', label='AP75', linewidth=2, markersize=8)

# Marcar baseline
baseline_row = summary_df[summary_df['Config'] == '⭐']
if not baseline_row.empty:
    baseline_th = baseline_row['Conf_Threshold'].values[0]
    plt.axvline(baseline_th, color='red', linestyle='--', linewidth=2, alpha=0.7, label='Baseline')

plt.xlabel('Confidence Threshold', fontsize=12, fontweight='bold')
plt.ylabel('Average Precision', fontsize=12, fontweight='bold')
plt.title('Detection Metrics vs Confidence Threshold', fontsize=14, fontweight='bold')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)

# Subplot 2: Trade-off Detections vs mAP
plt.subplot(1, 2, 2)
scatter = plt.scatter(
    summary_df['Detections/Img'], 
    summary_df['mAP'],
    c=summary_df['Conf_Threshold'],
    cmap='viridis',
    s=150,
    alpha=0.7,
    edgecolors='black',
    linewidth=1.5
)

# Marcar baseline
if not baseline_row.empty:
    plt.scatter(
        baseline_row['Detections/Img'].values[0],
        baseline_row['mAP'].values[0],
        color='red',
        s=300,
        marker='*',
        edgecolors='black',
        linewidth=2,
        label='Baseline',
        zorder=5
    )

plt.colorbar(scatter, label='Conf. Threshold')
plt.xlabel('Detections per Image', fontsize=12, fontweight='bold')
plt.ylabel('mAP', fontsize=12, fontweight='bold')
plt.title('Detection Density vs Performance Trade-off', fontsize=14, fontweight='bold')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(OUTPUT_DIR / 'summary_visualization.png', dpi=150, bbox_inches='tight')
plt.close()

print(f"✓ Visualización guardada en: {OUTPUT_DIR / 'summary_visualization.png'}")

# Generar reporte de texto detallado
report_lines = []
report_lines.append("="*80 + "\n")
report_lines.append("BASELINE CONFIGURATION SUMMARY\n")
report_lines.append("="*80 + "\n\n")

report_lines.append("MODEL CONFIGURATION\n")
report_lines.append("-"*80 + "\n")
report_lines.append(f"  Confidence Threshold: {baseline_config['inference']['conf_threshold']}\n")
report_lines.append(f"  NMS IoU: {baseline_config['inference']['nms_iou']}\n")
report_lines.append(f"  FPS: {perf_summary.get('fps', 'N/A'):.2f}\n")
report_lines.append(f"  GPU Memory: {perf_summary.get('peak_gpu_memory_mb', 0):.0f} MB\n\n")

report_lines.append("THRESHOLD SWEEP RESULTS\n")
report_lines.append("-"*80 + "\n")
report_lines.append(f"{'Threshold':<12} {'mAP':<8} {'AP50':<8} {'AP75':<8} {'Dets/Img':<12} {'Baseline':<10}\n")
report_lines.append("-"*80 + "\n")

for _, row in summary_df.iterrows():
    marker = "   ⭐" if row['Config'] == '⭐' else ""
    report_lines.append(
        f"{row['Conf_Threshold']:<12.2f} "
        f"{row['mAP']:<8.4f} "
        f"{row['AP50']:<8.4f} "
        f"{row['AP75']:<8.4f} "
        f"{row['Detections/Img']:<12.2f}"
        f"{marker}\n"
    )

report_lines.append("\n" + "="*80 + "\n")
report_lines.append("KEY OBSERVATIONS\n")
report_lines.append("="*80 + "\n")

# Análisis automático
best_map_row = summary_df.loc[summary_df['mAP'].idxmax()]
best_ap50_row = summary_df.loc[summary_df['AP50'].idxmax()]

report_lines.append(f"  • Best mAP: {best_map_row['mAP']:.4f} at threshold {best_map_row['Conf_Threshold']:.2f}\n")
report_lines.append(f"  • Best AP50: {best_ap50_row['AP50']:.4f} at threshold {best_ap50_row['Conf_Threshold']:.2f}\n")

if not baseline_row.empty:
    baseline_map = baseline_row['mAP'].values[0]
    map_diff = ((best_map_row['mAP'] - baseline_map) / baseline_map * 100) if baseline_map > 0 else 0
    report_lines.append(f"  • Baseline mAP could improve by {map_diff:.1f}% with threshold {best_map_row['Conf_Threshold']:.2f}\n")

report_lines.append("\n" + "="*80 + "\n")

with open(OUTPUT_DIR / 'summary_report.txt', 'w') as f:
    f.writelines(report_lines)

print(f"✓ Reporte detallado guardado en: {OUTPUT_DIR / 'summary_report.txt'}")
print("\n" + "".join(report_lines))


⚠️ Cargando métricas de rendimiento desde archivo...
✓ Métricas de rendimiento cargadas: 9 valores

TABLA RESUMEN BASELINE
Config  Conf_Threshold  NMS_IoU      mAP     AP50     AP75      FPS    GPU_MB  Detections/Img
                  0.05     0.65 0.170481 0.278535 0.170542 3.637273 1189.5625         11.0810
                  0.10     0.65 0.170481 0.278535 0.170542 3.637273 1189.5625         11.0810
                  0.15     0.65 0.170481 0.278535 0.170542 3.637273 1189.5625         11.0810
                  0.20     0.65 0.170481 0.278535 0.170542 3.637273 1189.5625         11.0810
                  0.25     0.65 0.170481 0.278535 0.170542 3.637273 1189.5625         11.0810
     ⭐            0.30     0.65 0.170481 0.278535 0.170542 3.637273 1189.5625         11.0810
                  0.35     0.65 0.154954 0.246966 0.157493 3.637273 1189.5625          7.9270
                  0.40     0.65 0.135954 0.210295 0.141558 3.637273 1189.5625          5.6665
                  0.50     0.65

## 14. Análisis de Errores (3.4)

In [25]:
# Cargar dependencias necesarias
if 'predictions' not in locals():
    print("⚠️ Cargando predicciones desde archivo...")
    preds_path = OUTPUT_DIR / 'preds_raw.json'
    with open(preds_path, 'r') as f:
        predictions = json.load(f)
    print(f"✓ {len(predictions)} predicciones cargadas")

if 'coco_val' not in locals():
    print("⚠️ Cargando anotaciones COCO...")
    coco_val = COCO(baseline_config['dataset']['val_eval_json'])
    print("✓ Anotaciones COCO cargadas")

if 'image_ids' not in locals():
    print("⚠️ Obteniendo IDs de imágenes...")
    image_ids = sorted(coco_val.getImgIds())
    print(f"✓ {len(image_ids)} imágenes encontradas")

# Inicializar análisis de errores
error_analysis = {
    'false_positives_high_conf': [],
    'false_negatives': [],
    'confusion_pairs': defaultdict(int)
}

print("Analizando errores (primeras 100 imágenes)...")

for img_id in tqdm(image_ids[:100], desc="Análisis de errores"):
    # Obtener predicciones para esta imagen
    img_preds = [p for p in predictions if p['image_id'] == img_id]
    
    # Obtener anotaciones ground truth
    ann_ids = coco_val.getAnnIds(imgIds=img_id)
    anns = coco_val.loadAnns(ann_ids)
    
    matched_gts = set()
    
    # Analizar cada predicción
    for pred in img_preds:
        pred_box = pred['bbox']
        pred_cat = pred['category_id']
        pred_score = pred['score']
        
        # Convertir bbox a coordenadas x1, y1, x2, y2
        px1, py1, pw, ph = pred_box
        px2 = px1 + pw
        py2 = py1 + ph
        p_area = pw * ph
        
        # Buscar mejor match con ground truth
        best_iou = 0.0
        best_ann = None
        
        for ann in anns:
            gx, gy, gw, gh = ann['bbox']
            gx2 = gx + gw
            gy2 = gy + gh
            g_area = gw * gh
            
            # Calcular IoU
            ix1 = max(px1, gx)
            iy1 = max(py1, gy)
            ix2 = min(px2, gx2)
            iy2 = min(py2, gy2)
            
            iw = max(0, ix2 - ix1)
            ih = max(0, iy2 - iy1)
            inter = iw * ih
            
            union = p_area + g_area - inter
            iou = inter / union if union > 0 else 0.0
            
            if iou > best_iou:
                best_iou = iou
                best_ann = ann
        
        # Clasificar predicción
        if best_iou >= 0.5 and best_ann and best_ann['category_id'] == pred_cat:
            # True Positive
            matched_gts.add(best_ann['id'])
        elif pred_score >= 0.5:
            # False Positive con alta confianza
            error_analysis['false_positives_high_conf'].append({
                'image_id': int(img_id),
                'pred_category': CAT_ID_TO_PROMPT[pred_cat],
                'score': float(pred_score),
                'iou': float(best_iou),
                'gt_category': CAT_ID_TO_PROMPT.get(best_ann['category_id'], 'none') if best_ann else 'none'
            })
            
            # Registrar confusión si hay match con otra clase
            if best_ann and best_ann['category_id'] != pred_cat:
                pred_name = CAT_ID_TO_PROMPT[pred_cat]
                gt_name = CAT_ID_TO_PROMPT[best_ann['category_id']]
                error_analysis['confusion_pairs'][(pred_name, gt_name)] += 1
    
    # Identificar False Negatives
    for ann in anns:
        if ann['id'] not in matched_gts:
            error_analysis['false_negatives'].append({
                'image_id': int(img_id),
                'category': CAT_ID_TO_PROMPT[ann['category_id']],
                'bbox': ann['bbox'],
                'area': float(ann['bbox'][2] * ann['bbox'][3])
            })

# Resumen de errores
print(f"\n{'='*80}")
print("ANÁLISIS DE ERRORES")
print(f"{'='*80}")
print(f"  Falsos positivos (conf >= 0.5): {len(error_analysis['false_positives_high_conf'])}")
print(f"  Falsos negativos: {len(error_analysis['false_negatives'])}")
print(f"  Pares de confusión únicos: {len(error_analysis['confusion_pairs'])}")

# Top confusiones
print(f"\n{'Pares de confusión más comunes:'}")
print(f"{'-'*80}")
confusion_sorted = sorted(error_analysis['confusion_pairs'].items(), key=lambda x: x[1], reverse=True)
for i, ((pred, gt), count) in enumerate(confusion_sorted[:10], 1):
    print(f"  {i:2d}. {pred:15s} ← {gt:15s}: {count:4d} veces")

# Análisis de FN por clase
print(f"\n{'Falsos negativos por clase:'}")
print(f"{'-'*80}")
fn_by_class = defaultdict(int)
for fn in error_analysis['false_negatives']:
    fn_by_class[fn['category']] += 1

for cat_name in BDD_COCO_CATEGORIES.values():
    count = fn_by_class.get(cat_name, 0)
    print(f"  {cat_name:15s}: {count:4d} FN")

# Análisis de FP por clase
print(f"\n{'Falsos positivos por clase (conf >= 0.5):'}")
print(f"{'-'*80}")
fp_by_class = defaultdict(int)
for fp in error_analysis['false_positives_high_conf']:
    fp_by_class[fp['pred_category']] += 1

for cat_name in BDD_COCO_CATEGORIES.values():
    count = fp_by_class.get(cat_name, 0)
    print(f"  {cat_name:15s}: {count:4d} FP")

# Guardar análisis completo
error_report = {
    'summary': {
        'total_false_positives': len(error_analysis['false_positives_high_conf']),
        'total_false_negatives': len(error_analysis['false_negatives']),
        'total_confusion_pairs': len(error_analysis['confusion_pairs']),
        'images_analyzed': 100
    },
    'false_positives_by_class': dict(fp_by_class),
    'false_negatives_by_class': dict(fn_by_class),
    'confusion_matrix': {f"{p}->{g}": c for (p, g), c in confusion_sorted[:20]},
    'sample_false_positives': error_analysis['false_positives_high_conf'][:20],
    'sample_false_negatives': error_analysis['false_negatives'][:20]
}

with open(OUTPUT_DIR / 'error_analysis.json', 'w') as f:
    json.dump(error_report, f, indent=2)

print(f"\n✓ Análisis de errores guardado en: {OUTPUT_DIR / 'error_analysis.json'}")

# Generar visualización de matriz de confusión
if len(confusion_sorted) > 0:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Subplot 1: Top confusiones
    top_confusions = confusion_sorted[:10]
    conf_labels = [f"{p}→{g}" for (p, g), _ in top_confusions]
    conf_values = [c for _, c in top_confusions]
    
    ax1.barh(range(len(conf_labels)), conf_values, color='#E74C3C')
    ax1.set_yticks(range(len(conf_labels)))
    ax1.set_yticklabels(conf_labels, fontsize=10)
    ax1.set_xlabel('Número de confusiones', fontsize=12, fontweight='bold')
    ax1.set_title('Top 10 Pares de Confusión', fontsize=14, fontweight='bold')
    ax1.grid(axis='x', alpha=0.3)
    ax1.invert_yaxis()
    
    # Subplot 2: FP vs FN por clase
    classes = list(BDD_COCO_CATEGORIES.values())
    fp_counts = [fp_by_class.get(c, 0) for c in classes]
    fn_counts = [fn_by_class.get(c, 0) for c in classes]
    
    x = np.arange(len(classes))
    width = 0.35
    
    ax2.bar(x - width/2, fp_counts, width, label='False Positives', color='#E74C3C')
    ax2.bar(x + width/2, fn_counts, width, label='False Negatives', color='#3498DB')
    
    ax2.set_xlabel('Clase', fontsize=12, fontweight='bold')
    ax2.set_ylabel('Cantidad', fontsize=12, fontweight='bold')
    ax2.set_title('Errores por Clase (100 imágenes)', fontsize=14, fontweight='bold')
    ax2.set_xticks(x)
    ax2.set_xticklabels(classes, rotation=45, ha='right')
    ax2.legend()
    ax2.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(OUTPUT_DIR / 'error_visualization.png', dpi=150, bbox_inches='tight')
    plt.close()
    
    print(f"✓ Visualización guardada en: {OUTPUT_DIR / 'error_visualization.png'}")

print(f"\n{'='*80}")



Analizando errores (primeras 100 imágenes)...


Análisis de errores: 100%|██████████| 100/100 [00:00<00:00, 307.97it/s]



ANÁLISIS DE ERRORES
  Falsos positivos (conf >= 0.5): 26
  Falsos negativos: 988
  Pares de confusión únicos: 11

Pares de confusión más comunes:
--------------------------------------------------------------------------------
   1. person          ← rider          :    3 veces
   2. truck           ← car            :    2 veces
   3. traffic light   ← traffic sign   :    2 veces
   4. rider           ← person         :    1 veces
   5. person          ← truck          :    1 veces
   6. motorcycle      ← rider          :    1 veces
   7. car             ← truck          :    1 veces
   8. train           ← car            :    1 veces
   9. motorcycle      ← bicycle        :    1 veces
  10. traffic sign    ← traffic light  :    1 veces

Falsos negativos por clase:
--------------------------------------------------------------------------------
  person         :   31 FN
  rider          :    2 FN
  car            :  598 FN
  truck          :   12 FN
  bus            :    6 FN
  train

## 15. Resumen Final y Criterios Go/No-Go (6)

In [26]:
# Cargar todas las dependencias necesarias
print("Preparando resumen final...")

# 1. Cargar baseline_config
if 'baseline_config' not in locals():
    print("⚠️ Cargando configuración baseline...")
    config_yaml = CONFIG_DIR / 'baseline.yaml'
    if config_yaml.exists():
        with open(config_yaml, 'r') as f:
            baseline_config = yaml.safe_load(f)
        print("✓ Configuración cargada")
    else:
        raise FileNotFoundError(f"No se encontró {config_yaml}\nEjecuta primero la Sección 2")

# 2. Cargar metrics_coco
if 'metrics_coco' not in locals():
    print("⚠️ Cargando métricas COCO...")
    metrics_json = OUTPUT_DIR / 'metrics.json'
    if metrics_json.exists():
        with open(metrics_json, 'r') as f:
            metrics_coco = json.load(f)
        print("✓ Métricas COCO cargadas")
    else:
        raise FileNotFoundError(f"No se encontró {metrics_json}\nEjecuta primero la Sección 8")

# 3. Cargar perf_summary
if 'perf_summary' not in locals():
    print("⚠️ Cargando métricas de rendimiento...")
    perf_txt = OUTPUT_DIR / 'perf.txt'
    if perf_txt.exists():
        perf_summary = {}
        with open(perf_txt, 'r') as f:
            for line in f:
                if ':' in line and '=' not in line:
                    try:
                        key, value = line.strip().split(':', 1)
                        perf_summary[key.strip()] = float(value.strip())
                    except:
                        pass
        print(f"✓ Métricas de rendimiento cargadas: {len(perf_summary)} valores")
    else:
        print("⚠️ perf.txt no encontrado, usando valores por defecto")
        perf_summary = {
            'avg_time_per_image_s': 0.0,
            'fps': 0.0,
            'total_images': 0
        }

# 4. Cargar error_analysis
if 'error_analysis' not in locals():
    print("⚠️ Cargando análisis de errores...")
    error_json = OUTPUT_DIR / 'error_analysis.json'
    if error_json.exists():
        with open(error_json, 'r') as f:
            error_data = json.load(f)
        # Reconstruir estructura original
        error_analysis = {
            'false_positives_high_conf': error_data.get('sample_false_positives', []),
            'false_negatives': error_data.get('sample_false_negatives', []),
            'confusion_pairs': {}
        }
        print("✓ Análisis de errores cargado")
    else:
        print("⚠️ error_analysis.json no encontrado, usando valores vacíos")
        error_analysis = {
            'false_positives_high_conf': [],
            'false_negatives': [],
            'confusion_pairs': {}
        }

# 5. Definir PR_DIR si no existe
if 'PR_DIR' not in locals():
    PR_DIR = OUTPUT_DIR / 'pr_curves'

# Construir reporte final
final_report = {
    'baseline_config': baseline_config,
    'metrics': metrics_coco,
    'performance': perf_summary,
    'artefacts': {
        'predictions': str(OUTPUT_DIR / 'preds_raw.json'),
        'metrics': str(OUTPUT_DIR / 'metrics.json'),
        'pr_curves': str(PR_DIR),
        'qualitative': str(QUALITATIVE_DIR),
        'performance': str(OUTPUT_DIR / 'perf.txt'),
        'calib_inputs': str(OUTPUT_DIR / 'calib_inputs.csv'),
        'threshold_sweep': str(OUTPUT_DIR / 'threshold_sweep.csv'),
        'summary_table': str(OUTPUT_DIR / 'summary_table.csv'),
        'error_analysis': str(OUTPUT_DIR / 'error_analysis.json')
    },
    'go_criteria': {
        'mAP_reasonable': metrics_coco.get('mAP', 0) > 0.05,
        'AP50_reasonable': metrics_coco.get('AP50', 0) > 0.10,
        'latency_measured': 'avg_time_per_image_s' in perf_summary,
        'artefacts_complete': True,
        'calib_inputs_generated': (OUTPUT_DIR / 'calib_inputs.csv').exists(),
        'errors_identified': len(error_analysis['false_positives_high_conf']) > 0
    },
    'timestamp': datetime.now().isoformat()
}

# Verificar existencia de artefactos
artefact_status = {}
for name, path in final_report['artefacts'].items():
    artefact_status[name] = Path(path).exists()

final_report['artefacts_status'] = artefact_status
final_report['go_criteria']['artefacts_complete'] = all(artefact_status.values())

# Imprimir resumen
print("\n" + "="*80)
print("RESUMEN FINAL BASELINE")
print("="*80)
print(f"\nModelo: {baseline_config['model']['name']} ({baseline_config['model']['architecture']})")
print(f"Checkpoint: {Path(baseline_config['model']['checkpoint']).name}")
print(f"Device: {baseline_config['model']['device']}")

print(f"\n{'Métricas de Detección:'}")
print(f"{'-'*80}")
print(f"  mAP@[.50:.95]: {metrics_coco.get('mAP', 0):.4f}")
print(f"  AP@50:         {metrics_coco.get('AP50', 0):.4f}")
print(f"  AP@75:         {metrics_coco.get('AP75', 0):.4f}")
print(f"  AP (small):    {metrics_coco.get('AP_small', 0):.4f}")
print(f"  AP (medium):   {metrics_coco.get('AP_medium', 0):.4f}")
print(f"  AP (large):    {metrics_coco.get('AP_large', 0):.4f}")

print(f"\n{'Rendimiento:'}")
print(f"{'-'*80}")
print(f"  Tiempo/imagen: {perf_summary.get('avg_time_per_image_s', 0):.3f}s")
print(f"  FPS:           {perf_summary.get('fps', 0):.2f}")
if 'peak_gpu_memory_mb' in perf_summary:
    print(f"  GPU memoria:   {perf_summary['peak_gpu_memory_mb']:.0f} MB")
print(f"  Total imgs:    {perf_summary.get('total_images', 0)}")

print(f"\n{'Top 3 clases (por AP50):'}")
print(f"{'-'*80}")
if 'per_class' in metrics_coco:
    per_class_sorted = sorted(
        metrics_coco['per_class'].items(),
        key=lambda x: x[1].get('AP50', 0),
        reverse=True
    )
    for i, (cat_name, metrics) in enumerate(per_class_sorted[:3], 1):
        print(f"  {i}. {cat_name:15s}: AP50={metrics.get('AP50', 0):.3f}  mAP={metrics.get('mAP', 0):.3f}")

print(f"\n{'Artefactos Generados:'}")
print(f"{'-'*80}")
for name, path in final_report['artefacts'].items():
    exists = artefact_status[name]
    status = "✓" if exists else "✗"
    file_size = ""
    if exists:
        path_obj = Path(path)
        if path_obj.is_file():
            size_mb = path_obj.stat().st_size / (1024 * 1024)
            file_size = f" ({size_mb:.1f} MB)"
        elif path_obj.is_dir():
            num_files = len(list(path_obj.glob('*')))
            file_size = f" ({num_files} archivos)"
    print(f"  {status} {name:20s}: {Path(path).name}{file_size}")

print(f"\n{'Criterios Go/No-Go para Fase 3:'}")
print(f"{'-'*80}")
all_pass = all(final_report['go_criteria'].values())
for criterion, passed in final_report['go_criteria'].items():
    status = "✓" if passed else "✗"
    criterion_label = criterion.replace('_', ' ').title()
    print(f"  {status} {criterion_label:30s}: {'PASS' if passed else 'FAIL'}")

print(f"\n{'='*80}")
if all_pass:
    print("✅ BASELINE COMPLETADO - LISTO PARA FASE 3 (Incertidumbre y Calibración)")
    print("\nPróximos pasos:")
    print("  1. Implementar métodos de incertidumbre epistémica")
    print("  2. Aplicar calibración (Platt Scaling, Temperature Scaling, etc.)")
    print("  3. Evaluar métricas de calibración (ECE, MCE)")
    print("  4. Generar diagramas de confiabilidad")
else:
    print("⚠️  BASELINE INCOMPLETO - Revisar criterios fallidos")
    failed_criteria = [k for k, v in final_report['go_criteria'].items() if not v]
    print(f"\nCriterios fallidos: {', '.join(failed_criteria)}")
print("="*80)

# Guardar reporte final
with open(OUTPUT_DIR / 'final_report.json', 'w') as f:
    json.dump(final_report, f, indent=2, default=str)

print(f"\n✓ Reporte final guardado en: {OUTPUT_DIR / 'final_report.json'}")

# Generar reporte de texto legible
report_lines = []
report_lines.append("="*80 + "\n")
report_lines.append("BASELINE EVALUATION REPORT\n")
report_lines.append("="*80 + "\n\n")
report_lines.append(f"Generated: {final_report['timestamp']}\n\n")

report_lines.append("MODEL CONFIGURATION\n")
report_lines.append("-"*80 + "\n")
report_lines.append(f"  Name:           {baseline_config['model']['name']}\n")
report_lines.append(f"  Architecture:   {baseline_config['model']['architecture']}\n")
report_lines.append(f"  Checkpoint:     {Path(baseline_config['model']['checkpoint']).name}\n")
report_lines.append(f"  Device:         {baseline_config['model']['device']}\n\n")

report_lines.append("DETECTION METRICS\n")
report_lines.append("-"*80 + "\n")
for metric_name in ['mAP', 'AP50', 'AP75', 'AP_small', 'AP_medium', 'AP_large']:
    value = metrics_coco.get(metric_name, 0)
    report_lines.append(f"  {metric_name:15s}: {value:.4f}\n")

report_lines.append("\nPERFORMANCE METRICS\n")
report_lines.append("-"*80 + "\n")
for metric_name, value in perf_summary.items():
    report_lines.append(f"  {metric_name:25s}: {value:.4f}\n")

report_lines.append("\nPER-CLASS METRICS (sorted by AP50)\n")
report_lines.append("-"*80 + "\n")
if 'per_class' in metrics_coco:
    per_class_sorted = sorted(
        metrics_coco['per_class'].items(),
        key=lambda x: x[1].get('AP50', 0),
        reverse=True
    )
    for cat_name, metrics in per_class_sorted:
        report_lines.append(
            f"  {cat_name:15s}: mAP={metrics.get('mAP', 0):.3f}  "
            f"AP50={metrics.get('AP50', 0):.3f}  "
            f"AP75={metrics.get('AP75', 0):.3f}\n"
        )

report_lines.append("\n" + "="*80 + "\n")
report_lines.append("GO/NO-GO DECISION: " + ("✅ GO" if all_pass else "❌ NO-GO") + "\n")
report_lines.append("="*80 + "\n")

with open(OUTPUT_DIR / 'final_report.txt', 'w') as f:
    f.writelines(report_lines)

print(f"✓ Reporte de texto guardado en: {OUTPUT_DIR / 'final_report.txt'}")

# Generar visualización de resumen
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. Métricas principales
ax1 = axes[0, 0]
metrics_main = ['mAP', 'AP50', 'AP75']
values_main = [metrics_coco.get(m, 0) for m in metrics_main]
colors_main = ['#3498DB', '#2ECC71', '#E74C3C']
bars = ax1.bar(metrics_main, values_main, color=colors_main, alpha=0.7, edgecolor='black', linewidth=2)
ax1.set_ylabel('Average Precision', fontsize=12, fontweight='bold')
ax1.set_title('Main Detection Metrics', fontsize=14, fontweight='bold')
ax1.set_ylim([0, 1])
ax1.grid(axis='y', alpha=0.3)
for bar, val in zip(bars, values_main):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 0.02,
             f'{val:.3f}', ha='center', va='bottom', fontweight='bold')

# 2. Métricas por tamaño
ax2 = axes[0, 1]
metrics_size = ['AP_small', 'AP_medium', 'AP_large']
values_size = [metrics_coco.get(m, 0) for m in metrics_size]
labels_size = ['Small', 'Medium', 'Large']
colors_size = ['#F39C12', '#9B59B6', '#1ABC9C']
bars = ax2.bar(labels_size, values_size, color=colors_size, alpha=0.7, edgecolor='black', linewidth=2)
ax2.set_ylabel('Average Precision', fontsize=12, fontweight='bold')
ax2.set_title('AP by Object Size', fontsize=14, fontweight='bold')
ax2.set_ylim([0, 1])
ax2.grid(axis='y', alpha=0.3)
for bar, val in zip(bars, values_size):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height + 0.02,
             f'{val:.3f}', ha='center', va='bottom', fontweight='bold')

# 3. Top/Bottom clases por AP50
ax3 = axes[1, 0]
if 'per_class' in metrics_coco:
    per_class_sorted = sorted(
        metrics_coco['per_class'].items(),
        key=lambda x: x[1].get('AP50', 0)
    )
    # Top 5 y bottom 5
    classes_display = per_class_sorted[:3] + per_class_sorted[-3:]
    class_names = [c[0] for c in classes_display]
    class_ap50 = [c[1].get('AP50', 0) for c in classes_display]
    colors_classes = ['#E74C3C']*3 + ['#2ECC71']*3
    
    y_pos = np.arange(len(class_names))
    ax3.barh(y_pos, class_ap50, color=colors_classes, alpha=0.7, edgecolor='black', linewidth=1.5)
    ax3.set_yticks(y_pos)
    ax3.set_yticklabels(class_names)
    ax3.set_xlabel('AP50', fontsize=12, fontweight='bold')
    ax3.set_title('Bottom 3 vs Top 3 Classes (by AP50)', fontsize=14, fontweight='bold')
    ax3.set_xlim([0, 1])
    ax3.grid(axis='x', alpha=0.3)
    ax3.invert_yaxis()

# 4. Estado de artefactos
ax4 = axes[1, 1]
artefact_names = list(artefact_status.keys())
artefact_exists = [1 if v else 0 for v in artefact_status.values()]
colors_artefacts = ['#2ECC71' if e else '#E74C3C' for e in artefact_exists]

y_pos = np.arange(len(artefact_names))
ax4.barh(y_pos, artefact_exists, color=colors_artefacts, alpha=0.7, edgecolor='black', linewidth=1.5)
ax4.set_yticks(y_pos)
ax4.set_yticklabels([n.replace('_', ' ').title() for n in artefact_names], fontsize=9)
ax4.set_xlabel('Status', fontsize=12, fontweight='bold')
ax4.set_title('Artefact Generation Status', fontsize=14, fontweight='bold')
ax4.set_xlim([0, 1.2])
ax4.set_xticks([0, 1])
ax4.set_xticklabels(['Missing', 'Generated'])
ax4.grid(axis='x', alpha=0.3)
ax4.invert_yaxis()

plt.tight_layout()
plt.savefig(OUTPUT_DIR / 'final_summary_visualization.png', dpi=150, bbox_inches='tight')
plt.close()

print(f"✓ Visualización de resumen guardada en: {OUTPUT_DIR / 'final_summary_visualization.png'}")

print(f"\n{'='*80}")
print("ARCHIVOS GENERADOS EN ESTA SECCIÓN:")
print(f"{'='*80}")
print(f"  1. {OUTPUT_DIR / 'final_report.json'}")
print(f"  2. {OUTPUT_DIR / 'final_report.txt'}")
print(f"  3. {OUTPUT_DIR / 'final_summary_visualization.png'}")
print(f"{'='*80}\n")


Preparando resumen final...

RESUMEN FINAL BASELINE

Modelo: Grounding-DINO (SwinT-OGC)
Checkpoint: groundingdino_swint_ogc.pth
Device: cuda

Métricas de Detección:
--------------------------------------------------------------------------------
  mAP@[.50:.95]: 0.1705
  AP@50:         0.2785
  AP@75:         0.1705
  AP (small):    0.0633
  AP (medium):   0.1821
  AP (large):    0.3770

Rendimiento:
--------------------------------------------------------------------------------
  Tiempo/imagen: 0.275s
  FPS:           3.64
  GPU memoria:   1190 MB
  Total imgs:    2000.0

Top 3 clases (por AP50):
--------------------------------------------------------------------------------
  1. person         : AP50=0.000  mAP=0.000
  2. rider          : AP50=0.000  mAP=0.000
  3. car            : AP50=0.000  mAP=0.000

Artefactos Generados:
--------------------------------------------------------------------------------
  ✓ predictions         : preds_raw.json (3.2 MB)
  ✓ metrics             : m