<a id="top"></a>

# 02 · Data Smoke & Bench

Pruebas rápidas para validar **datos/splits**, comprobar el **pipeline de codificación temporal** (offline H5 o runtime) y hacer **micro-bench** de *throughput*. Lee la configuración desde `configs/presets.yaml` para asegurar coherencia con el cuaderno 03.

---

## 🎯 Objetivos
- Detectar rápido problemas en datos/H5
- Tener una referencia de it/s y formas de tensores antes de lanzar entrenos largos.

<a id="toc"></a>

## 🧭 Índice

- [1) Setup del entorno e imports](#sec-01)
- [2) Cargar preset y eco de configuración](#sec-02)
- [3) Verificación de datos y carga de `task_list`](#sec-03)
- [4) Factory de DataLoaders (H5 offline o CSV + runtime)](#sec-04)
- [5) Smoke universal (loader → modelo)](#sec-05)
- [6) Forward manual con fallback AMP](#sec-06)
- [7) Echo de configuración del bench](#sec-07)
- [8) Toggle: *it/s* por época](#sec-08)
- [9) (Opc.) Micro-bench pipeline *it/s*](#sec-09)
- [10) (Opc.) Smokes de métodos con `run_continual`](#sec-10)


<a id="sec-01"></a>

## 1) Setup del entorno e imports

**Objetivo:** preparar el entorno para micro-benchmarks y *smoke tests* de los loaders/modelo.

- Configura hilos BLAS (evita over-subscription).
- Detecta `ROOT` y añade el repo al `sys.path`.
- Importa utilidades de *bench*: `make_loader_fn_factory`, `universal_smoke_forward`, etc.
- Selecciona dispositivo (`cuda` si está) y activa optimizaciones (TF32, cuDNN benchmark).

> Este notebook **no entrena**; su objetivo es validar *data → loader → forward* y medir throughput base.

[↑ Volver al índice](#toc)


In [1]:
# =============================================================================
# Imports y setup
# =============================================================================
import os
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"

from pathlib import Path
import sys, json, torch

ROOT = Path.cwd().parents[0] if (Path.cwd().name == "notebooks") else Path.cwd()
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))

from src.utils import load_preset, set_seeds
from src.datasets import ImageTransform, AugmentConfig
from src.models import build_model, default_tfm_for_model
from src.training import TrainConfig
from src.eval import eval_loader
from src.bench import (
    make_loader_fn_factory,
    universal_smoke_forward,
    enable_epoch_ips, disable_epoch_ips,
    print_bench_config,
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.set_num_threads(4)
torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.set_float32_matmul_precision("high")
print("Device:", device)


Device: cuda


<a id="sec-02"></a>

## 2) Config desde `presets.yaml` y eco de configuración

**Objetivo:** usar el mismo preset del proyecto para alinear el bench con entrenamiento.

Incluye:
- Lectura de `PRESET` y extracción de:
  - Modelo + `ImageTransform`.
  - Parámetros de codificación temporal (`ENCODER`, `T`, `GAIN`, `SEED`).
  - Parámetros de DataLoader (workers, prefetch, pin/persistent).
  - *Augment* (`AUG_CFG`) y balanceo online (si aplica).
- Define `make_model_fn(tfm)` para instanciar el modelo con los mismos hiperparámetros neuronales.

> Así, cualquier *smoke/bench* que hagas aquí usa exactamente la misma configuración que 03_TRAIN_CONTINUAL.

[↑ Volver al índice](#toc)


In [2]:
# =============================================================================
# Config: lee presets.yaml
# =============================================================================
PRESET = "fast"  # fast | std | accurate
CFG = load_preset(ROOT / "configs" / "presets.yaml", PRESET)

# Modelo / tfm
MODEL_NAME = CFG["model"]["name"]
tfm = ImageTransform(
    CFG["model"]["img_w"], CFG["model"]["img_h"],
    to_gray=bool(CFG["model"]["to_gray"]), crop_top=None
)

# Datos / codificación temporal
ENCODER = CFG["data"]["encoder"]
T       = int(CFG["data"]["T"])
GAIN    = float(CFG["data"]["gain"])
SEED    = int(CFG["data"]["seed"])

USE_OFFLINE_SPIKES   = bool(CFG["data"]["use_offline_spikes"])
USE_OFFLINE_BALANCED = bool(CFG["data"]["use_offline_balanced"])
RUNTIME_ENCODE       = bool(CFG["data"]["encode_runtime"])

# DataLoader / augment / balance
NUM_WORKERS  = int(CFG["data"]["num_workers"])
PREFETCH     = CFG["data"]["prefetch_factor"]
PIN_MEMORY   = bool(CFG["data"]["pin_memory"])
PERSISTENT   = bool(CFG["data"]["persistent_workers"])

AUG_CFG = AugmentConfig(**CFG["data"]["aug_train"]) if CFG["data"]["aug_train"] else None

USE_ONLINE_BALANCING = bool(CFG["data"]["balance_online"])
BAL_BINS = int(CFG["data"]["balance_bins"])
BAL_EPS  = float(CFG["data"]["balance_smooth_eps"])

print(f"[PRESET={PRESET}] model={MODEL_NAME} {tfm.w}x{tfm.h} gray={tfm.to_gray}")
print(f"[DATA] encoder={ENCODER} T={T} gain={GAIN} seed={SEED}")
print(f"[LOADER] workers={NUM_WORKERS} prefetch={PREFETCH} pin={PIN_MEMORY} persistent={PERSISTENT}")
print(f"[BALANCE] offline={USE_OFFLINE_BALANCED} online={USE_ONLINE_BALANCING} bins={BAL_BINS}")
print(f"[RUNTIME_ENCODE] {RUNTIME_ENCODE} | [OFFLINE_SPIKES] {USE_OFFLINE_SPIKES}")

def make_model_fn(tfm):
    # kwargs solo necesarios para pilotnet_snn
    return build_model(MODEL_NAME, tfm, beta=0.9, threshold=0.5)


[PRESET=fast] model=pilotnet_snn 200x66 gray=True
[DATA] encoder=rate T=10 gain=0.5 seed=42
[LOADER] workers=8 prefetch=2 pin=True persistent=True
[BALANCE] offline=False online=False bins=21
[RUNTIME_ENCODE] False | [OFFLINE_SPIKES] True


<a id="sec-03"></a>

## 3) Verificación de datos y carga de `task_list`

Comprueba `train/val/test.csv` por recorrido y el `train_balanced.csv` si activas modo offline balanceado. Construye `task_list` desde `tasks.json`/`tasks_balanced.json`.

[↑ Volver al índice](#toc)


In [3]:
# =============================================================================
# Verificación de datos (normal y, si existe, balanceado offline) + tasks list
# =============================================================================
PROC = ROOT / "data" / "processed"
RAW  = ROOT / "data" / "raw" / "udacity"

# 1) Comprobación de splits presentes
RUNS = [d.name for d in PROC.iterdir() if d.is_dir()]
missing = []
for run in RUNS:
    base = PROC / run
    for part in ["train", "val", "test"]:
        p = base / f"{part}.csv"
        if not p.exists():
            missing.append(str(p))
    p_bal = base / "train_balanced.csv"
    if p_bal.exists():
        print(f"✓ {p_bal} OK")
    else:
        print(f"  Falta {p_bal}. Si vas a usar USE_OFFLINE_BALANCED=True,"
              f" ejecuta 01A_PREP_BALANCED.ipynb o tools/make_splits_balanced.py")

if missing:
    raise FileNotFoundError(
        "Faltan CSV obligatorios:\n" + "\n".join(" - " + m for m in missing)
    )
print("OK: splits 'train/val/test' encontrados.")

# 2) Cargar tasks.json | tasks_balanced.json
TASKS_FILE = PROC / ("tasks_balanced.json" if USE_OFFLINE_BALANCED else "tasks.json")
with open(TASKS_FILE, "r", encoding="utf-8") as f:
    tasks_json = json.load(f)

task_list = [{"name": n, "paths": tasks_json["splits"][n]} for n in tasks_json["tasks_order"]]
print("Tareas y su TRAIN CSV:")
for t in task_list:
    from pathlib import Path as _P
    print(f" - {t['name']}: {_P(t['paths']['train']).name}")

# Guardarraíl si activas OFFLINE balanceado:
if USE_OFFLINE_BALANCED:
    from pathlib import Path as _P
    for t in task_list:
        train_path = _P(t["paths"]["train"])
        if train_path.name != "train_balanced.csv":
            raise RuntimeError(f"[{t['name']}] Esperaba 'train_balanced.csv' pero encontré '{train_path.name}'.")
        if not train_path.exists():
            raise FileNotFoundError(f"[{t['name']}] No existe {train_path}.")
    print("Verificación OFFLINE balanceado superada.")


✓ /home/cesar/proyectos/TFM_SNN/data/processed/circuito1/train_balanced.csv OK
✓ /home/cesar/proyectos/TFM_SNN/data/processed/circuito2/train_balanced.csv OK
OK: splits 'train/val/test' encontrados.
Tareas y su TRAIN CSV:
 - circuito1: train.csv
 - circuito2: train.csv


<a id="sec-04"></a>

## 4) Factory de DataLoaders (H5 offline o CSV + runtime)

Crea `make_loader_fn` con los flags del preset (workers, prefetch, pin, persistent, augment, balanceo online). No pases `SEED` aquí.

[↑ Volver al índice](#toc)


In [4]:
# =============================================================================
# Factory de loaders (H5 offline o CSV + runtime encode)
# =============================================================================
make_loader_fn = make_loader_fn_factory(
    ROOT,
    RUNTIME_ENCODE=RUNTIME_ENCODE,
    num_workers=NUM_WORKERS,
    prefetch_factor=PREFETCH,
    pin_memory=PIN_MEMORY,
    persistent_workers=PERSISTENT,
    aug_train=AUG_CFG,
    # balanceo online
    balance_train=USE_ONLINE_BALANCING,
    balance_bins=BAL_BINS,
    balance_smooth_eps=BAL_EPS,
)


<a id="sec-05"></a>

## 5) Smoke universal (loader → modelo)

Verifica que el primer `task` produce batches válidos y que el modelo hace *forward* sin errores (AMP si hay CUDA). Usa `ENCODER`, `T`, `GAIN` del preset.

[↑ Volver al índice](#toc)


In [5]:
# =============================================================================
# Smoke universal de loader+modelo (usa variables del preset)
# =============================================================================
_ = universal_smoke_forward(
    make_loader_fn,
    task=task_list[0],
    encoder=ENCODER, T=T, gain=GAIN,
    tfm=tfm, seed=SEED, device=device,
    use_runtime_encode=RUNTIME_ENCODE,
)


dataset ya codificado; solo permuto a (T,B,C,H,W)
x5d.device: cpu | shape: (10, 8, 1, 66, 200)
[forward] ejecutado con AMP


<a id="sec-06"></a>

## 6) Forward manual con fallback AMP

Ejecuta el camino completo manual: loader → (T,B,C,H,W) → modelo, con AMP si está disponible y limpieza del runtime-encode.

[↑ Volver al índice](#toc)


In [6]:
# =============================================================================
# Forward manual: loader -> (T,B,C,H,W) con fallback AMP
# =============================================================================
import torch, src.training as training

# 1) Loader pequeño con tu helper
tr, va, te = make_loader_fn(
    task=task_list[0],
    batch_size=8,
    encoder=ENCODER,   # <-- preset
    T=T,               # <-- preset
    gain=GAIN,         # <-- preset
    tfm=tfm,
    seed=SEED,
)

xb, yb = next(iter(tr))
print("batch del loader:", xb.shape, yb.shape)

# 2) A (T,B,C,H,W) según formato de entrada
if xb.ndim == 5:  # (B,T,C,H,W)
    x5d = xb.permute(1,0,2,3,4).contiguous()
    used_runtime_encode = False
    print("dataset ya codificado; solo permuto a (T,B,C,H,W)")
elif xb.ndim == 4:  # (B,C,H,W)
    training.set_runtime_encode(
        mode=ENCODER, T=T, gain=GAIN,
        device=torch.device("cuda" if torch.cuda.is_available() else "cpu")
    )
    x5d = training._permute_if_needed(xb)  # encode+permuta -> (T,B,C,H,W)
    used_runtime_encode = True
    print("dataset 4D; uso encode en GPU y permuto a (T,B,C,H,W)")
else:
    raise RuntimeError(f"Forma inesperada del batch: {xb.shape}")

print("x5d.device:", x5d.device, "| shape:", tuple(x5d.shape))

# 3) Modelo y forward con fallback AMP
model = make_model_fn(tfm).to(device).eval()

def forward_with_auto_amp(model, x5d, device):
    if torch.cuda.is_available():
        try:
            x_amp = x5d.to(device, dtype=torch.float16, non_blocking=True)
            with torch.inference_mode(), torch.amp.autocast('cuda', enabled=True):
                y = model(x_amp)
            print("[forward] ejecutado con AMP (fp16)")
            return y
        except Exception as e:
            print("[forward] AMP falló, reintento en FP32. Motivo:", str(e))

    x_fp32 = x5d.to(device, dtype=torch.float32, non_blocking=True)
    with torch.inference_mode():
        y = model(x_fp32)
    print("[forward] ejecutado en FP32")
    return y

yhat = forward_with_auto_amp(model, x5d, device)
print("yhat:", tuple(yhat.shape))

# 4) Limpieza del runtime encode (si se usó)
if used_runtime_encode:
    training.set_runtime_encode(None)


batch del loader: torch.Size([8, 10, 1, 66, 200]) torch.Size([8, 1])
dataset ya codificado; solo permuto a (T,B,C,H,W)
x5d.device: cpu | shape: (10, 8, 1, 66, 200)
[forward] ejecutado con AMP (fp16)
yhat: (8, 1)


<a id="sec-07"></a>

## 7) Echo de configuración del bench

Imprime los parámetros efectivos de DataLoader y balanceo. Útil para rastrear cambios de preset.

[↑ Volver al índice](#toc)


In [7]:
# =============================================================================
# Bench: echo de configuración efectiva
# =============================================================================
print_bench_config(
    NUM_WORKERS=NUM_WORKERS, PREFETCH=PREFETCH,
    PIN_MEMORY=PIN_MEMORY, PERSISTENT=PERSISTENT,
    USE_OFFLINE_BALANCED=USE_OFFLINE_BALANCED, USE_ONLINE_BALANCING=USE_ONLINE_BALANCING
)


[Bench workers=8 prefetch=2 pin=True persistent=True | offline_bal=False online_bal=False


<a id="sec-08"></a>

## 8) Toggle: *it/s* por época

Activa un parche temporal que imprime *iterations/second* por época durante entrenamiento. Llama a `disable_epoch_ips()` para restaurar.

[↑ Volver al índice](#toc)


In [8]:
# =============================================================================
# Toggle: imprimir it/s por época durante entrenamiento
# =============================================================================
enable_epoch_ips()
print("it/s por época ACTIVADO. Llama a disable_epoch_ips() para restaurar.")


it/s por época ACTIVADO. Llama a disable_epoch_ips() para restaurar.


<a id="sec-09"></a>

## 9) (Opcional) Micro-bench pipeline *it/s*

Mide *throughput* aproximado iterando `loader → modelo` varias veces, con AMP si procede.

[↑ Volver al índice](#toc)


In [9]:
# =============================================================================
# (Opcional) Micro-bench pipeline: it/s (loader + modelo)
# =============================================================================
from src.bench import pipeline_its

# pequeño loader
tr, va, te = make_loader_fn(
    task=task_list[0],
    batch_size=8,
    encoder=ENCODER, T=T, gain=GAIN,
    tfm=tfm, seed=SEED,
)

# preparar un batch inicial
xb0, _ = next(iter(tr))
if xb0.ndim == 4:
    import src.training as training
    training.set_runtime_encode(mode=ENCODER, T=T, gain=GAIN, device=device)

its = pipeline_its(
    model=make_model_fn(tfm).to(device).eval(),
    loader=tr, device=device,
    iters=100, use_amp=True,
    encoder=ENCODER, T=T, gain=GAIN,
)
print(f"pipeline it/s ≈ {its:.2f}")


pipeline it/s ≈ 29.19


<a id="sec-10"></a>

## 10) (Opcional) Smokes de métodos con `run_continual`

Lanza runs cortos por método (`naive`, `ewc`, `rehearsal`, `rehearsal+ewc`) usando la **misma** `CFG` del preset para asegurar coherencia con el 03.

[↑ Volver al índice](#toc)


In [10]:
# =============================================================================
# (Opcional) Smokes de métodos con run_continual (cfg del preset)
# =============================================================================
from copy import deepcopy
from src.runner import run_continual

DO_RUNS = True  # pon False para saltar

if DO_RUNS:
    base_cfg = load_preset(ROOT / "configs" / "presets.yaml", PRESET)
    base_cfg = deepcopy(base_cfg)
    base_cfg["data"]["seed"] = SEED  # fija la semilla si quieres

    METHODS = {
        "naive": {},
        "ewc": {"lam": 1e9, "fisher_batches": 200},
        "rehearsal": {"buffer_size": 5000, "replay_ratio": 0.2},
        "rehearsal+ewc": {"buffer_size": 5000, "replay_ratio": 0.2, "lam": 7e8, "fisher_batches": 200},
    }

    out_runs = []
    for mname, mparams in METHODS.items():
        cfg_i = deepcopy(base_cfg)
        cfg_i["continual"]["method"] = mname
        cfg_i["continual"]["params"] = mparams

        print(
            f"\n>>> SMOKE: preset={PRESET} | method={mname} | "
            f"seed={cfg_i['data']['seed']} | enc={cfg_i['data']['encoder']} | kwargs={mparams}"
        )
        out_dir, _ = run_continual(
            task_list=task_list,
            make_loader_fn=make_loader_fn,  # factory de Celda 4
            make_model_fn=make_model_fn,
            tfm=tfm,
            cfg=cfg_i,               # CONFIG COMPLETA del preset con el método cambiado
            preset_name=PRESET,      # solo naming
            out_root=ROOT / "outputs",
            verbose=True,
        )
        out_runs.append(out_dir)
    print("\nHecho:", [str(p) for p in out_runs])



>>> SMOKE: preset=fast | method=naive | seed=42 | enc=rate | kwargs={}

--- Tarea 1/2: circuito1 | preset=fast | method=naive | B=64 T=10 AMP=True | enc=rate ---
[TRAIN it/s] epoch 1/2: 8.4 it/s (164 iters en 19.64s)
[TRAIN it/s] epoch 2/2: 10.5 it/s (164 iters en 15.56s)

--- Tarea 2/2: circuito2 | preset=fast | method=naive | B=64 T=10 AMP=True | enc=rate ---
[TRAIN it/s] epoch 1/2: 7.0 it/s (49 iters en 7.04s)
[TRAIN it/s] epoch 2/2: 8.5 it/s (49 iters en 5.77s)

>>> SMOKE: preset=fast | method=ewc | seed=42 | enc=rate | kwargs={'lam': 1000000000.0, 'fisher_batches': 200}

--- Tarea 1/2: circuito1 | preset=fast | method=ewc_lam_1e+09 | B=64 T=10 AMP=True | enc=rate ---
[TRAIN it/s] epoch 1/2: 10.1 it/s (164 iters en 16.18s)
[TRAIN it/s] epoch 2/2: 9.3 it/s (164 iters en 17.58s)

--- Tarea 2/2: circuito2 | preset=fast | method=ewc_lam_1e+09 | B=64 T=10 AMP=True | enc=rate ---
[TRAIN it/s] epoch 1/2: 8.4 it/s (49 iters en 5.84s)
[TRAIN it/s] epoch 2/2: 7.8 it/s (49 iters en 6.32s)

>