# EEG Motor Imagery (Left/Right) — CNN + Transformer
Este notebook entrena y evalúa un modelo CNN+Transformer para clasificación binaria (left/right)
en EEG usando K-Fold por sujeto, con opciones de:
- Preprocesamiento
- Balanceo (WeightedRandomSampler)
- EMA de pesos
- Inferencia con TTA y/o subventanas
- Interpretabilidad
- Barrido de arquitectura y evaluación final 5-fold del mejor set de hiperparámetros


### Configuración Inicial y Reproducibilidad

In [1]:
# -*- coding: utf-8 -*-
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "7"
os.environ['PYTHONHASHSEED'] = '42'
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'

import re, json, random, copy, time, csv
from pathlib import Path
from glob import glob
import numpy as np
import mne
from tqdm import tqdm
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset, WeightedRandomSampler
from sklearn.metrics import (
    accuracy_score, f1_score, confusion_matrix, classification_report,
    precision_score, recall_score
)
from sklearn.model_selection import StratifiedShuffleSplit, StratifiedKFold
from scipy import stats
from sklearn.manifold import TSNE  # Añadido para t-SNE

### Configuración de Reproducibilidad

Esta sección establece las semillas y configuraciones necesarias para garantizar resultados reproducibles en el entrenamiento del modelo.

In [2]:
# =========================
# REPRODUCIBILIDAD
# =========================
RANDOM_STATE = 42

def seed_everything(seed: int = 42):
    """
    Establece semillas para todas las bibliotecas de aleatoriedad
    para garantizar reproducibilidad completa.
    """
    os.environ["PYTHONHASHSEED"] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    try:
        torch.backends.cuda.matmul.allow_tf32 = False
        torch.backends.cudnn.allow_tf32 = False
    except Exception:
        pass
    try:
        torch.use_deterministic_algorithms(True)
    except Exception:
        pass

def seed_worker(worker_id: int):
    """
    Función para inicializar workers de DataLoader con semilla.
    """
    worker_seed = RANDOM_STATE + worker_id
    np.random.seed(worker_seed)
    random.seed(worker_seed)

seed_everything(RANDOM_STATE)

### Configuración de Hiperparámetros

Definición de todos los hiperparámetros del modelo y preprocesamiento de datos EEG.

In [3]:
# =========================
# CONFIG
# =========================
PROJ = Path('..').resolve().parent
DATA_RAW = PROJ / 'data' / 'raw'
FOLDS_JSON = PROJ / 'models' / '00_folds' / 'Kfold5.json'

# Hiperparámetros de entrenamiento
EPOCHS = 60
BATCH_SIZE = 64
BASE_LR = 5e-4
WARMUP_EPOCHS = 4
PATIENCE = 8

# Split de validación por fold (por sujetos)
VAL_SUBJECT_FRAC = 0.18
VAL_STRAT_SUBJECT = True

# Prepro global
RESAMPLE_HZ = None
DO_NOTCH = True
DO_BANDPASS = False
BP_LO, BP_HI = 4.0, 38.0
DO_CAR = False
ZSCORE_PER_EPOCH = False

# Parámetros del modelo
D_MODEL = 144
N_HEADS = 4
N_LAYERS = 1
P_DROP = 0.1
P_DROP_ENCODER = 0.1

# Ventana temporal
TMIN, TMAX = -1.0, 5.0

# TTA / SUBWINDOW en TEST
SW_MODE = 'tta'   # 'none'|'subwin'|'tta'
SW_ENABLE = True
TTA_SHIFTS_S = [-0.075, -0.05, -0.025, 0.0, 0.025, 0.05, 0.075]
SW_LEN, SW_STRIDE = 4.5, 1.5
COMBINE_TTA_AND_SUBWIN = False

# Sampler balanceado
USE_WEIGHTED_SAMPLER = True

# EMA (solo GLOBAL). En FT se desactiva.
USE_EMA = True
EMA_DECAY = 0.9995

# Fine-tuning (por sujeto) — sigue activo, pero ahora con n_cls=2
FT_N_FOLDS = 4
FT_FREEZE_EPOCHS = 8
FT_UNFREEZE_EPOCHS = 8
FT_PATIENCE = 6
FT_BATCH = 64
FT_LR_HEAD = 1e-3
FT_LR_BACKBONE = 2e-4
FT_WD = 1e-3
FT_AUG = dict(p_jitter=0.25, p_noise=0.25, p_chdrop=0.10, max_jitter_frac=0.02, noise_std=0.02, max_chdrop=1)

# Configuración de sujetos y canales
EXCLUDE_SUBJECTS = {38, 88, 89, 92, 100, 104}
EXPECTED_8 = ['C3','C4','Cz','CP3','CP4','FC3','FC4','FCz']
CLASS_NAMES = ['left', 'right']  # <-- 2 clases

# Solo runs de imaginación L/R
IMAGERY_RUNS_LR = {4, 8, 12}

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f" Usando dispositivo: {DEVICE}")
print(" INICIANDO EXPERIMENTO CON CNN+Transformer (K-Fold por sujeto)")
print(f" Config: 2c (L/R), 8 canales, 6s | EPOCHS={EPOCHS}, BATCH={BATCH_SIZE}, LR={BASE_LR} | ZSCORE_PER_EPOCH={ZSCORE_PER_EPOCH}")

 Usando dispositivo: cuda
 INICIANDO EXPERIMENTO CON CNN+Transformer (K-Fold por sujeto)
 Config: 2c (L/R), 8 canales, 6s | EPOCHS=60, BATCH=64, LR=0.0005 | ZSCORE_PER_EPOCH=False


### Funciones de Utilidad para I/O

Funciones para manejar la carga y normalización de datos EEG.

In [4]:
# =========================
# UTILIDADES I/O
# =========================
def normalize_ch_name(name: str) -> str:
    """
    Normaliza nombres de canales eliminando caracteres especiales.
    """
    s = re.sub(r'[^A-Za-z0-9]', '', name)
    return s.upper()

NORMALIZED_TARGETS = [normalize_ch_name(c) for c in EXPECTED_8]

def pick_8_channels(raw: mne.io.BaseRaw) -> mne.io.BaseRaw:
    """
    Selecciona los 8 canales motores específicos del dataset.
    """
    chs = raw.info['ch_names']
    norm_map = {normalize_ch_name(ch): ch for ch in chs}
    picked = []
    for target_norm, target_orig in zip(NORMALIZED_TARGETS, EXPECTED_8):
        if target_norm in norm_map:
            picked.append(norm_map[target_norm])
        else:
            raise RuntimeError(f"Canal requerido '{target_orig}' no encontrado. Disponibles: {chs}")
    return raw.pick(picks=picked)

def list_subject_imagery_edfs(subject_id: str) -> list:
    """
    Lista archivos EDF de imaginación motora para un sujeto.
    """
    subj_dir = DATA_RAW / subject_id
    edfs = []
    for r in [4, 6, 8, 10, 12, 14]:
        edfs.extend(glob(str(subj_dir / f"{subject_id}R{r:02d}.edf")))
    return sorted(edfs)

def subject_id_to_int(s: str) -> int:
    """
    Convierte ID de sujeto de string a entero.
    """
    m = re.match(r'[Ss](\d+)', s)
    return int(m.group(1)) if m else -1

def load_subject_epochs(subject_id: str, resample_hz: int, do_notch: bool, do_bandpass: bool,
                        do_car: bool, bp_lo: float, bp_hi: float):
    """
    Carga y preprocesa épocas EEG para un sujeto específico.
    """
    edfs = list_subject_imagery_edfs(subject_id)
    if len(edfs) == 0:
        return np.empty((0,8,1), dtype=np.float32), np.empty((0,), dtype=int), None

    X_list, y_list, sfreq_list = [], [], []
    for edf_path in edfs:
        m = re.search(r"R(\d{2})", Path(edf_path).name)
        run = int(m.group(1)) if m else -1

        # Usar SOLO runs L/R
        if run not in IMAGERY_RUNS_LR:
            continue

        raw = mne.io.read_raw_edf(edf_path, preload=True, verbose='ERROR')
        raw = pick_8_channels(raw)

        if do_notch:
            raw.notch_filter(freqs=[60.0], picks='all', verbose='ERROR')
        if do_bandpass:
            raw.filter(l_freq=bp_lo, h_freq=bp_hi, picks='all', verbose='ERROR')
        if do_car:
            raw.set_eeg_reference('average', projection=False, verbose='ERROR')
        if resample_hz is not None and resample_hz > 0:
            raw.resample(resample_hz)

        sfreq = raw.info['sfreq']
        events, event_id = mne.events_from_annotations(raw, verbose='ERROR')

        keep = {k: v for k, v in event_id.items() if k in {'T1', 'T2'}}
        if len(keep) == 0:
            continue

        epochs = mne.Epochs(raw, events=events, event_id=keep, tmin=TMIN, tmax=TMAX,
                            baseline=None, preload=True, verbose='ERROR')
        X = epochs.get_data()

        if ZSCORE_PER_EPOCH:
            X = X.astype(np.float32)
            eps = 1e-6
            mu = X.mean(axis=2, keepdims=True)
            sd = X.std(axis=2, keepdims=True) + eps
            X = (X - mu) / sd

        ev_codes = epochs.events[:, 2]
        inv = {v: k for k, v in keep.items()}
        y_run = np.array([0 if inv[c] == 'T1' else 1 for c in ev_codes], dtype=int)

        X_list.append(X)
        y_list.append(y_run)
        sfreq_list.append(sfreq)

    if len(X_list) == 0:
        return np.empty((0,8,1), dtype=np.float32), np.empty((0,), dtype=int), None

    X_all = np.concatenate(X_list, axis=0).astype(np.float32)
    y_all = np.concatenate(y_list, axis=0).astype(int)

    if len(set([int(round(s)) for s in sfreq_list])) != 1:
        raise RuntimeError(f"Sampling rates inconsistentes: {sfreq_list}")

    return X_all, y_all, sfreq_list[0]

def load_fold_subjects(folds_json: Path, fold: int):
    """
    Carga sujetos de train y test para un fold específico.
    """
    with open(folds_json, 'r') as f:
        data = json.load(f)
    for item in data.get('folds', []):
        if int(item.get('fold', -1)) == int(fold):
            return list(item.get('train', [])), list(item.get('test', []))
    raise ValueError(f"Fold {fold} not found in {folds_json}")

def standardize_per_channel(train_X, other_X):
    """
    Estandariza datos por canal usando estadísticas del conjunto de entrenamiento.
    """
    C = train_X.shape[1]
    train_X = train_X.astype(np.float32)
    other_X = other_X.astype(np.float32)
    for c in range(C):
        mu = train_X[:, c, :].mean()
        sd = train_X[:, c, :].std()
        sd = sd if sd > 1e-6 else 1.0
        train_X[:, c, :] = (train_X[:, c, :] - mu) / sd
        other_X[:, c, :] = (other_X[:, c, :] - mu) / sd
    return train_X, other_X

### Arquitectura del Modelo: CNN + Transformer

Implementación del modelo híbrido CNN-Transformer para clasificación de EEG.

In [5]:
# =========================
# MODELO (GroupNorm en conv) con capturas y variaciones
# =========================
def make_gn(num_channels, num_groups=8):
    """
    Crea capa GroupNorm adaptativa que garantiza divisibilidad.
    """
    g = min(num_groups, num_channels)
    while num_channels % g != 0 and g > 1:
        g -= 1
    return nn.GroupNorm(g, num_channels)

class DepthwiseSeparableConv(nn.Module):
    """
    Convolución depthwise separable para reducción de parámetros.
    """
    def __init__(self, in_ch, out_ch, k, s=1, p=0, p_drop=0.2):
        super().__init__()
        self.dw = nn.Conv1d(in_ch, in_ch, kernel_size=k, stride=s, padding=p, groups=in_ch, bias=False)
        self.pw = nn.Conv1d(in_ch, out_ch, kernel_size=1, bias=False)
        self.norm = make_gn(out_ch)
        self.act = nn.ELU()
        self.dropout = nn.Dropout(p=p_drop)

    def forward(self, x):
        x = self.dw(x); x = self.pw(x); x = self.norm(x)
        x = self.act(x); x = self.dropout(x)
        return x

class _CustomEncoderLayer(nn.Module):
    """
    Capa personalizada del encoder Transformer con atención multi-cabeza.
    """
    def __init__(self, d_model, nhead, dim_feedforward, dropout, batch_first, norm_first, return_attn=True):
        super().__init__()
        self.return_attn = return_attn
        self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first)
        self.linear1 = nn.Linear(d_model, dim_feedforward)
        self.dropout = nn.Dropout(dropout)
        self.linear2 = nn.Linear(dim_feedforward, d_model)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)
        self.activation = nn.GELU()
        self.norm_first = norm_first

    def forward(self, src):
        sa_out, attn_weights = self.self_attn(src, src, src, need_weights=True, average_attn_weights=False)
        src = self.norm1(src + self.dropout1(sa_out))
        ff = self.linear2(self.dropout(self.activation(self.linear1(src))))
        src = self.norm2(src + self.dropout2(ff))
        return src, attn_weights  # [B, heads, T, T]

class _CustomEncoder(nn.Module):
    """
    Encoder Transformer compuesto por múltiples capas.
    """
    def __init__(self, layer, num_layers):
        super().__init__()
        self.layers = nn.ModuleList([copy.deepcopy(layer) for _ in range(num_layers)])

    def forward(self, src):
        attn_list = []
        out = src
        for lyr in self.layers:
            out, attn = lyr(out)
            attn_list.append(attn)
        return out, attn_list

class EEGCNNTransformer(nn.Module):
    """
    Modelo híbrido CNN-Transformer para clasificación de señales EEG.
    """
    def __init__(self, n_ch=8, n_cls=2, d_model=128, n_heads=4, n_layers=2,
                 p_drop=0.2, p_drop_encoder=0.3, n_dw_blocks=2,
                 k_list=(31,15,7), s_list=(2,2,2), p_list=(15,7,3), capture_attn=False):
        super().__init__()
        self.capture_attn = capture_attn
        self._last_attn = None
        self.pos_encoding = None

        # Stem convolucional inicial
        stem = [
            nn.Conv1d(n_ch, 32, kernel_size=129, stride=2, padding=64, bias=False),
            make_gn(32),
            nn.ELU(),
            nn.Dropout(p=p_drop),
        ]
        
        # Bloques depthwise separable
        blocks = []
        in_c, out_cs = 32, [64, 128, 256, 256, 256]
        n_dw_blocks = int(np.clip(n_dw_blocks, 0, len(out_cs)))
        for i in range(n_dw_blocks):
            k = k_list[i] if i < len(k_list) else 7
            s = s_list[i] if i < len(s_list) else 2
            p = p_list[i] if i < len(p_list) else k//2
            blocks.append(DepthwiseSeparableConv(in_c, out_cs[i], k=k, s=s, p=p, p_drop=p_drop))
            in_c = out_cs[i]

        self.conv_t = nn.Sequential(*stem, *blocks)
        self.proj = nn.Conv1d(in_c if n_dw_blocks>0 else 32, d_model, kernel_size=1, bias=False)
        self.dropout = nn.Dropout(p=p_drop_encoder)

        # Encoder Transformer
        enc = _CustomEncoderLayer(d_model=d_model, nhead=n_heads, dim_feedforward=2*d_model,
                                  dropout=0.1, batch_first=True, norm_first=False, return_attn=True)
        self.encoder = _CustomEncoder(enc, num_layers=n_layers)

        # Token CLS y head de clasificación
        self.cls = nn.Parameter(torch.zeros(1, 1, d_model))
        nn.init.normal_(self.cls, std=0.02)
        self.head = nn.Sequential(nn.LayerNorm(d_model), nn.Linear(d_model, n_cls))

    def _positional_encoding(self, L, d):
        """
        Codificación posicional sinusoidal para secuencias.
        """
        pos = torch.arange(0, L, dtype=torch.float32).unsqueeze(1)
        i   = torch.arange(0, d, dtype=torch.float32).unsqueeze(0)
        angle = pos / torch.pow(10000, (2 * (i//2)) / d)
        pe = torch.zeros(L, d, dtype=torch.float32)
        pe[:, 0::2] = torch.sin(angle[:, 0::2])
        pe[:, 1::2] = torch.cos(angle[:, 1::2])
        return pe

    def forward(self, x):
        # Capas convolucionales
        z = self.conv_t(x)           # (B, C', T')
        z = self.proj(z)             # (B, d_model, T')
        z = self.dropout(z)
        z = z.transpose(1, 2)        # (B, T', d_model)
        
        # Codificación posicional
        B, L, D = z.shape
        if (self.pos_encoding is None) or (self.pos_encoding.shape[0] != L) or (self.pos_encoding.shape[1] != D):
            self.pos_encoding = self._positional_encoding(L, D).to(z.device)
        z = z + self.pos_encoding[None, :, :]
        
        # Añadir token CLS
        cls_tok = self.cls.expand(B, -1, -1)
        z = torch.cat([cls_tok, z], dim=1)  # (B, 1+L, D)

        # Encoder Transformer
        z, attn_list = self.encoder(z)
        if self.capture_attn:
            self._last_attn = attn_list

        # Clasificación usando token CLS
        cls = z[:, 0, :]
        return self.head(cls)

    def get_last_attention_maps(self):
        """
        Retorna mapas de atención de la última pasada forward.
        """
        return self._last_attn

### Funciones de Pérdida y Aumentación de Datos

Implementación de Focal Loss y técnicas de aumentación de datos para EEG.

In [6]:
# =========================
# FOCAL LOSS (2 clases)
# =========================
class FocalLoss(nn.Module):
    """
    Focal Loss para manejar desbalance de clases.
    Reduce la contribución de ejemplos fáciles y enfoca en los difíciles.
    """
    def __init__(self, alpha: torch.Tensor, gamma: float = 1.5, reduction: str = 'mean'):
        super().__init__()
        self.alpha = alpha / alpha.sum()
        self.gamma = gamma
        self.reduction = reduction

    def forward(self, logits, target):
        logp = nn.functional.log_softmax(logits, dim=-1)      # (B,C)
        p = logp.exp()
        idx = torch.arange(target.shape[0], device=logits.device)
        pt = p[idx, target]
        logpt = logp[idx, target]
        at = self.alpha[target]
        loss = - at * ((1 - pt) ** self.gamma) * logpt
        if self.reduction == 'mean': return loss.mean()
        if self.reduction == 'sum':  return loss.sum()
        return loss

# =========================
# AUGMENTS
# =========================
def augment_batch(xb, p_jitter=0.35, p_noise=0.35, p_chdrop=0.15,
                  max_jitter_frac=0.03, noise_std=0.03, max_chdrop=1):
    """
    Aplica aumentación de datos a un batch de señales EEG.
    Técnicas: jitter temporal, ruido gaussiano, drop de canales.
    """
    B, C, T = xb.shape
    if np.random.rand() < p_jitter:
        max_shift = int(max(1, T*max_jitter_frac))
        shifts = torch.randint(low=-max_shift, high=max_shift+1, size=(B,), device=xb.device)
        for i in range(B):
            xb[i] = torch.roll(xb[i], shifts=int(shifts[i].item()), dims=-1)
    if np.random.rand() < p_noise:
        xb = xb + noise_std*torch.randn_like(xb)
    if np.random.rand() < p_chdrop and max_chdrop > 0:
        k = min(max_chdrop, C)
        for i in range(B):
            idx = torch.randperm(C, device=xb.device)[:k]
            xb[i, idx, :] = 0.0
    return xb

def augment_batch_ft(xb):
    """
    Aplica aumentación específica para fine-tuning.
    """
    return augment_batch(xb, **FT_AUG)

### EMA y Técnicas de Inferencia

Implementación de Exponential Moving Average y técnicas de inferencia robusta.

In [7]:
# =========================
# EMA de pesos (solo global)
# =========================
class ModelEMA:
    """
    Exponential Moving Average para estabilizar el entrenamiento.
    Mantiene una versión suavizada de los pesos del modelo.
    """
    def __init__(self, model: nn.Module, decay: float = 0.9995, device=None):
        self.ema = self._clone(model).to(device if device is not None else next(model.parameters()).device)
        self.decay = decay
        self._updates = 0
        self.update(model, force=True)

    def _clone(self, model):
        import copy
        ema = copy.deepcopy(model)
        for p in ema.parameters():
            p.requires_grad_(False)
        return ema

    @torch.no_grad()
    def update(self, model: nn.Module, force: bool = False):
        d = self.decay
        if self._updates < 1000:
            d = (self._updates / 1000.0) * self.decay
        msd = model.state_dict()
        esd = self.ema.state_dict()
        for k in esd.keys():
            if esd[k].dtype.is_floating_point:
                esd[k].mul_(d).add_(msd[k].detach(), alpha=1.0 - d)
            else:
                esd[k] = msd[k]
        self._updates += 1

# =========================
# INFERENCIA TTA / SUBWINDOW
# =========================
def subwindow_logits(model, X, sfreq, sw_len, sw_stride, device):
    """
    Inferencia usando subventanas para capturar información temporal.
    """
    model.eval()
    wl = int(round(sw_len * sfreq))
    st = int(round(sw_stride * sfreq))
    wl = max(1, min(wl, X.shape[-1])); st = max(1, st)
    out = []
    with torch.no_grad():
        for i in range(X.shape[0]):
            x = X[i]; acc = []
            for s in range(0, max(1, X.shape[-1]-wl+1), st):
                seg = x[:, s:s+wl]
                if seg.shape[-1] < wl:
                    pad = wl - seg.shape[-1]
                    seg = np.pad(seg, ((0,0),(0,pad)), mode='edge')
                xb = torch.tensor(seg[None, ...], dtype=torch.float32, device=device)
                logit = model(xb).detach().cpu().numpy()[0]
                acc.append(logit)
            acc = np.mean(np.stack(acc, axis=0), axis=0) if len(acc) else np.zeros(2, dtype=np.float32)
            out.append(acc)
    return np.stack(out, axis=0)

def time_shift_tta_logits(model, X, sfreq, shifts_s, device):
    """
    Test Time Augmentation usando desplazamientos temporales.
    """
    model.eval()
    T = X.shape[-1]; out = []
    with torch.no_grad():
        for i in range(X.shape[0]):
            x0 = X[i]; acc = []
            for sh in shifts_s:
                shift = int(round(sh * sfreq))
                if shift == 0:
                    x = x0
                elif shift > 0:
                    x = np.pad(x0[:, shift:], ((0,0),(0,shift)), mode='edge')[:, :T]
                else:
                    shift = -shift
                    x = np.pad(x0[:, :-shift], ((0,0),(shift,0)), mode='edge')[:, :T]
                xb = torch.tensor(x[None, ...], dtype=torch.float32, device=device)
                logit = model(xb).detach().cpu().numpy()[0]
                acc.append(logit)
            out.append(np.mean(np.stack(acc, axis=0), axis=0))
    return np.stack(out, axis=0)

### Métricas y Herramientas de Visualización

Funciones para calcular métricas y visualizar resultados.

In [8]:
# =========================
# Métricas extendidas
# =========================
def compute_metrics(y_true, y_pred):
    """
    Calcula métricas extendidas de clasificación.
    """
    prec_macro = precision_score(y_true, y_pred, average='macro', zero_division=0)
    rec_macro  = recall_score(y_true, y_pred, average='macro', zero_division=0)
    f1_macro   = f1_score(y_true, y_pred, average='macro', zero_division=0)

    cm = confusion_matrix(y_true, y_pred, labels=[0,1])
    TN0 = cm[1,1]; FP0 = cm[1,0]
    spec0 = TN0 / (TN0 + FP0 + 1e-12)
    TN1 = cm[0,0]; FP1 = cm[0,1]
    spec1 = TN1 / (TN1 + FP1 + 1e-12)
    spec_macro = (spec0 + spec1) / 2.0

    recs = recall_score(y_true, y_pred, labels=[0,1], average=None, zero_division=0)
    sens_macro = float(recs.mean())

    return {
        "precision_macro": float(prec_macro),
        "recall_macro": float(rec_macro),
        "f1_macro": float(f1_macro),
        "specificity_macro": float(spec_macro),
        "sensitivity_macro": float(sens_macro),
        "cm": cm.tolist()
    }

# =========================
# Helper: matriz de confusión con anotaciones
# =========================
def plot_confusion_with_text(cm, class_names, title, out_path, cmap='Blues'):
    """
    Plota matriz de confusión con valores anotados.
    """
    fig, ax = plt.subplots(figsize=(4.8, 4.2))
    im = ax.imshow(cm, cmap=cmap)
    ax.set_title(title)
    ax.set_xlabel("Predicted")
    ax.set_ylabel("True label")
    ax.set_xticks(range(len(class_names))); ax.set_xticklabels(class_names)
    ax.set_yticks(range(len(class_names))); ax.set_yticklabels(class_names)

    vmax = cm.max() if cm.size else 1
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, str(cm[i, j]),
                    ha='center', va='center',
                    color='white' if cm[i, j] > vmax/2 else 'black',
                    fontsize=11, fontweight='bold')

    plt.tight_layout()
    plt.savefig(out_path, dpi=140)
    plt.close(fig)

### Herramientas de Interpretabilidad

Funciones para analizar y visualizar el comportamiento del modelo.

In [9]:
# =========================
# FUNCIONES DE INTERPRETABILIDAD (SOLO LAS QUE PIDES)
# =========================

# ===== Saliency por canal =====
def channel_saliency(model, xb, target_class=None):
    """
    Importancia por canal basada en |∂score/∂x| promedio en el tiempo.
    xb: tensor (1, C, T) en GPU.
    Devuelve: vector (C,) normalizado [0,1].
    """
    model.eval()
    xb = xb.clone().detach().requires_grad_(True)

    logits = model(xb)
    if target_class is None:
        cls = logits.argmax(1)
    else:
        cls = torch.tensor([int(target_class)], device=xb.device)
    score = logits[0, cls]

    model.zero_grad(set_to_none=True)
    if xb.grad is not None:
        xb.grad.zero_()
    score.backward()

    grad = xb.grad.detach()[0]     # (C, T)
    sal = grad.abs().mean(dim=1)   # (C,)
    sal = (sal - sal.min()) / (sal.max() - sal.min() + 1e-8)
    return sal.cpu().numpy()

# ===== Topomaps =====
def make_mne_info_8ch(sfreq: float):
    """
    Crea un Info de MNE con tus 8 canales motores.
    """
    info = mne.create_info(
        ch_names=EXPECTED_8,
        sfreq=float(sfreq),
        ch_types="eeg"
    )
    try:
        montage = mne.channels.make_standard_montage("standard_1020")
        info.set_montage(montage)
    except Exception:
        pass
    return info

def plot_topomap_from_values(values, info, title: str, out_path: Path,
                             cmap="Reds", cbar_label="Value",
                             vmin=None, vmax=None):
    """
    Dibuja y guarda un topomap dado un vector de valores por canal.
    values: array (n_channels,)
    """
    values = np.asarray(values, dtype=float)
    fig, ax = plt.subplots(figsize=(4.2, 4.0))
    im, _ = mne.viz.plot_topomap(
        values, info, axes=ax, show=False, cmap=cmap
    )
    if (vmin is not None) or (vmax is not None):
        im.set_clim(vmin=vmin, vmax=vmax)
    ax.set_title(title, fontsize=10)
    cbar = plt.colorbar(im, ax=ax, shrink=0.7)
    cbar.set_label(cbar_label, fontsize=9)
    plt.tight_layout()
    fig.savefig(out_path, dpi=160, bbox_inches="tight")
    plt.close(fig)
    print(f"↳ Topomap guardado: {out_path}")

# ===== -log10(p) análisis estadístico =====
def compute_channel_pvalues_lr(X, y):
    """
    P-values por canal para la diferencia left vs right.
    Usa potencia media (X^2) promediada en el tiempo como feature.
    X: (N, C, T)  | y: (N,) con labels 0=left, 1=right
    Devuelve: vector (C,) con p-values (ttest_ind Welch).
    """
    X = np.asarray(X, dtype=float)
    y = np.asarray(y, dtype=int)
    left = X[y == 0]
    right = X[y == 1]

    if (left.size == 0) or (right.size == 0):
        return np.ones(X.shape[1], dtype=float)

    feat_left = (left ** 2).mean(axis=2)   # (N_left, C)
    feat_right = (right ** 2).mean(axis=2)

    pvals = []
    for ch in range(feat_left.shape[1]):
        _, p = stats.ttest_ind(
            feat_left[:, ch],
            feat_right[:, ch],
            equal_var=False
        )
        pvals.append(p)
    return np.array(pvals, dtype=float)

# ===== t-SNE de CLS embeddings =====
def compute_cls_embeddings(model, X_data, device):
    """
    Calcula embeddings CLS del modelo.
    X_data: (N, C, T)
    Devuelve: (N, d_model) embeddings CLS
    """
    model.eval()
    embeddings = []
    with torch.no_grad():
        for i in range(0, len(X_data), 64):
            xb = torch.tensor(X_data[i:i+64], dtype=torch.float32, device=device)
            # Pasar por el modelo completo para obtener CLS token
            z = model.conv_t(xb)           # (B, C', T')
            z = model.proj(z)              # (B, d_model, T')
            z = z.transpose(1, 2)          # (B, T', d_model)
            
            # Añadir positional encoding (simulando forward)
            B, L, D = z.shape
            if model.pos_encoding is None or model.pos_encoding.shape[0] != L or model.pos_encoding.shape[1] != D:
                pos_enc = model._positional_encoding(L, D).to(z.device)
            else:
                pos_enc = model.pos_encoding
            z = z + pos_enc[None, :, :]
            
            # Añadir CLS token y pasar por encoder
            cls_tok = model.cls.expand(B, -1, -1)
            z_with_cls = torch.cat([cls_tok, z], dim=1)  # (B, 1+L, D)
            
            # Pasar por encoder (solo necesitamos CLS)
            z_encoder, _ = model.encoder(z_with_cls)
            cls_embedding = z_encoder[:, 0, :]  # CLS token (B, D)
            embeddings.append(cls_embedding.cpu().numpy())
    
    embeddings = np.concatenate(embeddings, axis=0)
    return embeddings

def compute_tsne_embeddings(model, X_data, device, n_components=2, perplexity=30):
    """
    Calcula embeddings CLS y luego aplica t-SNE.
    X_data: (N, C, T)
    """
    # Obtener embeddings CLS
    embeddings = compute_cls_embeddings(model, X_data, device)
    
    # Aplicar t-SNE
    tsne = TSNE(n_components=n_components, perplexity=perplexity, 
                random_state=RANDOM_STATE, init='pca', n_iter=1000)
    tsne_result = tsne.fit_transform(embeddings)
    return tsne_result, embeddings

def plot_tsne(tsne_result, y_true, title, out_path: Path):
    """
    Grafica t-SNE con colores según clase.
    """
    fig, ax = plt.subplots(figsize=(7, 5))
    
    # Colores para cada clase
    colors = ['green', 'blue']  # 0:left (verde), 1:right (azul)
    
    for cls_idx in range(2):
        mask = y_true == cls_idx
        if mask.any():
            ax.scatter(tsne_result[mask, 0], tsne_result[mask, 1], 
                      c=colors[cls_idx], label=CLASS_NAMES[cls_idx], 
                      alpha=0.7, s=25, edgecolors='w', linewidth=0.5)
    
    ax.set_title(title, fontsize=12, fontweight='bold')
    ax.set_xlabel("t-SNE 1")
    ax.set_ylabel("t-SNE 2")
    ax.legend(title="Clase", fontsize=9)
    ax.grid(True, alpha=0.3, linestyle='--', linewidth=0.5)
    
    # Añadir información adicional
    n_samples = len(y_true)
    n_left = np.sum(y_true == 0)
    n_right = np.sum(y_true == 1)
    info_text = f"Total: {n_samples} | Left: {n_left} | Right: {n_right}"
    ax.text(0.02, 0.98, info_text, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
    
    plt.tight_layout()
    fig.savefig(out_path, dpi=160, bbox_inches="tight")
    plt.close(fig)
    print(f"↳ t-SNE (CLS embeddings) guardado: {out_path}")

### Métricas de Consumo y Hiperparámetros

Funciones para calcular consumo computacional y gestionar hiperparámetros.

In [10]:
# =========================
# Consumo / hiperparámetros
# =========================
def count_params(model, trainable_only=False):
    """
    Cuenta parámetros del modelo.
    """
    if trainable_only:
        return int(sum(p.numel() for p in model.parameters() if p.requires_grad))
    return int(sum(p.numel() for p in model.parameters()))

def estimate_transformer_flops(L, d, n_heads, n_layers):
    """
    Estima FLOPs del encoder Transformer.
    """
    d_ff = 2*d
    proj = 4 * L * (d*d)         # Q,K,V,Out
    attn = 2 * (L*L*d)           # scores + aplicar a V
    ff   = 2 * L * (d * d_ff)    # FFN (2 lineales)
    return n_layers * (proj + attn + ff)

@torch.no_grad()
def benchmark_latency(model, X_sample, device, n_warm=10, n_runs=50):
    """
    Mide latencia de inferencia del modelo.
    """
    if torch.is_tensor(X_sample):
        xb = X_sample.clone().detach().to(device)
    else:
        xb = torch.tensor(X_sample, dtype=torch.float32, device=device).clone().detach()

    for _ in range(n_warm):
        _ = model(xb)
    if torch.cuda.is_available():
        torch.cuda.synchronize()

    t0 = time.time()
    for _ in range(n_runs):
        _ = model(xb)
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return (time.time() - t0) / n_runs

def current_hparams():
    """
    Retorna diccionario con todos los hiperparámetros actuales.
    """
    return {
        "EPOCHS": EPOCHS, "BATCH_SIZE": BATCH_SIZE, "BASE_LR": BASE_LR,
        "WARMUP_EPOCHS": WARMUP_EPOCHS, "PATIENCE": PATIENCE,
        "RESAMPLE_HZ": RESAMPLE_HZ, "DO_NOTCH": DO_NOTCH, "DO_BANDPASS": DO_BANDPASS,
        "BP_LO": BP_LO, "BP_HI": BP_HI, "DO_CAR": DO_CAR, "ZSCORE_PER_EPOCH": ZSCORE_PER_EPOCH,
        "D_MODEL": D_MODEL, "N_HEADS": N_HEADS, "N_LAYERS": N_LAYERS,
        "P_DROP": P_DROP, "P_DROP_ENCODER": P_DROP_ENCODER,
        "TMIN": TMIN, "TMAX": TMAX, "SW_MODE": SW_MODE, "SW_ENABLE": SW_ENABLE,
        "TTA_SHIFTS_S": TTA_SHIFTS_S, "SW_LEN": SW_LEN, "SW_STRIDE": SW_STRIDE,
        "COMBINE_TTA_AND_SUBWIN": COMBINE_TTA_AND_SUBWIN,
        "USE_WEIGHTED_SAMPLER": USE_WEIGHTED_SAMPLER,
        "USE_EMA": USE_EMA, "EMA_DECAY": EMA_DECAY,
        "FT_N_FOLDS": FT_N_FOLDS, "FT_FREEZE_EPOCHS": FT_FREEZE_EPOCHS, "FT_UNFREEZE_EPOCHS": FT_UNFREEZE_EPOCHS,
        "FT_PATIENCE": FT_PATIENCE, "FT_BATCH": FT_BATCH, "FT_LR_HEAD": FT_LR_HEAD,
        "FT_LR_BACKBONE": FT_LR_BACKBONE, "FT_WD": FT_WD, "FT_AUG": FT_AUG,
    }

# =========================
# Helpers de salida/archivos
# =========================
def ensure_dir(p: Path):
    """
    Crea directorio si no existe.
    """
    p.mkdir(parents=True, exist_ok=True)
    return p

def make_run_dirs(base_dir: Path, tag: str, fold: int):
    """
    Devuelve (run_dir, fold_dir).
    """
    run_dir = ensure_dir(base_dir / tag)
    fold_dir = ensure_dir(run_dir / f"fold{fold}")
    return run_dir, fold_dir

def save_csv_rows(csv_path: Path, fieldnames: list, rows: list):
    """
    Guarda filas en archivo CSV.
    """
    write_header = not csv_path.exists()
    with open(csv_path, "a", newline="") as f:
        w = csv.DictWriter(f, fieldnames=fieldnames)
        if write_header:
            w.writeheader()
        for r in rows:
            w.writerow(r)

### Función Principal de Entrenamiento

Función que entrena un fold completo del modelo.

In [11]:
# =========================
# TRAIN/EVAL GLOBAL POR FOLD
# =========================
def train_one_fold(
    fold:int,
    device,
    model_factory=None,
    run_tag="base",
    out_dir: Path = None,
    seed=None,
    save_artifacts: bool = True,  
    do_ft: bool = True            
):
    """
    Entrena y evalúa el modelo para un fold específico.
    
    Args:
        fold: Número del fold (1-5)
        device: Dispositivo para entrenamiento (CPU/GPU)
        model_factory: Función que crea instancia del modelo
        run_tag: Identificador de la ejecución
        out_dir: Directorio de salida
        seed: Semilla para reproducibilidad
        save_artifacts: Si guarda artefactos de entrenamiento
        do_ft: Si realiza fine-tuning por sujeto
    """
    # Semilla fija por corrida (justa entre combos/folds)
    base = RANDOM_STATE if seed is None else int(seed)
    seed_everything(base)

    def load_fold_subjects_local(folds_json: Path, fold: int):
        return load_fold_subjects(folds_json, fold)

    # --- directorios de salida ---
    if save_artifacts:
        base_out = Path(".") if out_dir is None else Path(out_dir)
        run_dir, fold_dir = make_run_dirs(base_out, run_tag, fold)
        print(f"[IO] Guardando artefactos en: {fold_dir.resolve()}")
    else:
        run_dir = None
        fold_dir = None
        print("[LITE] GridSearch: no se guardarán artefactos por combinación.")

    # --- split de sujetos ---
    train_sub, test_sub = load_fold_subjects_local(FOLDS_JSON, fold)
    train_sub = [s for s in train_sub if subject_id_to_int(s) not in EXCLUDE_SUBJECTS]
    test_sub  = [s for s in test_sub  if subject_id_to_int(s) not in EXCLUDE_SUBJECTS]

    rng = np.random.RandomState(RANDOM_STATE + fold)
    tr_subjects = sorted(train_sub)

    if VAL_STRAT_SUBJECT and len(tr_subjects) > 1:
        y_dom = build_subject_label_map(tr_subjects)
        if np.any(y_dom < 0):
            mask = y_dom >= 0
            moda = int(np.bincount(y_dom[mask]).argmax()) if mask.sum() > 0 else 0
            y_dom[~mask] = moda
        n_val_subj = max(1, int(round(len(tr_subjects) * VAL_SUBJECT_FRAC)))
        sss = StratifiedShuffleSplit(n_splits=1, test_size=n_val_subj, random_state=RANDOM_STATE + fold)
        idx = np.arange(len(tr_subjects))
        _, val_idx = next(sss.split(idx, y_dom))
        val_subjects = sorted([tr_subjects[i] for i in val_idx])
        train_subjects = [s for s in tr_subjects if s not in val_subjects]
    else:
        tr_subjects_shuf = tr_subjects.copy()
        rng.shuffle(tr_subjects_shuf)
        n_val_subj = max(1, int(round(len(tr_subjects_shuf) * VAL_SUBJECT_FRAC)))
        val_subjects = sorted(tr_subjects_shuf[:n_val_subj])
        train_subjects = sorted(tr_subjects_shuf[n_val_subj:])

    # --- carga de datos ---
    X_tr_list, y_tr_list, sub_tr_list = [], [], []
    X_val_list, y_val_list, sub_val_list = [], [], []
    X_te_list, y_te_list, sub_te_list = [], [], []
    sfreq = None

    for sid in tqdm(train_subjects, desc=f"Cargando train fold{fold}"):
        Xs, ys, sf = load_subject_epochs(sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS, DO_CAR, BP_LO, BP_HI)
        if len(ys) == 0: continue
        X_tr_list.append(Xs); y_tr_list.append(ys)
        sub_tr_list.append(np.full_like(ys, fill_value=subject_id_to_int(sid)))
        sfreq = sf if sfreq is None else sfreq

    for sid in tqdm(val_subjects, desc=f"Cargando val fold{fold}"):
        Xs, ys, sf = load_subject_epochs(sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS, DO_CAR, BP_LO, BP_HI)
        if len(ys) == 0: continue
        X_val_list.append(Xs); y_val_list.append(ys)
        sub_val_list.append(np.full_like(ys, fill_value=subject_id_to_int(sid)))
        sfreq = sf if sfreq is None else sfreq

    for sid in tqdm(test_sub, desc=f"Cargando test fold{fold}"):
        Xs, ys, sf = load_subject_epochs(sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS, DO_CAR, BP_LO, BP_HI)
        if len(ys) == 0: continue
        X_te_list.append(Xs); y_te_list.append(ys)
        sub_te_list.append(np.full_like(ys, fill_value=subject_id_to_int(sid)))
        sfreq = sf if sfreq is None else sfreq

    # Concatenar
    X_tr = np.concatenate(X_tr_list, axis=0); y_tr = np.concatenate(y_tr_list, axis=0)
    sub_tr = np.concatenate(sub_tr_list, axis=0)
    X_val = np.concatenate(X_val_list, axis=0); y_val = np.concatenate(y_val_list, axis=0)
    sub_val = np.concatenate(sub_val_list, axis=0)
    X_te = np.concatenate(X_te_list, axis=0); y_te = np.concatenate(y_te_list, axis=0)
    sub_te = np.concatenate(sub_te_list, axis=0)

    print(f"[Fold {fold}/5] Entrenando modelo global... (n_train={len(y_tr)} | n_val={len(y_val)} | n_test={len(y_te)})")

    # Normalización por canal
    if ZSCORE_PER_EPOCH:
        X_tr_std, X_val_std, X_te_std = X_tr, X_val, X_te
    else:
        X_tr_std, X_val_std = standardize_per_channel(X_tr, X_val)
        _,        X_te_std  = standardize_per_channel(X_tr, X_te)

    # Datasets
    tr_ds  = TensorDataset(torch.tensor(X_tr_std),  torch.tensor(y_tr).long(),  torch.tensor(sub_tr).long())
    val_ds = TensorDataset(torch.tensor(X_val_std), torch.tensor(y_val).long(), torch.tensor(sub_val).long())
    te_ds  = TensorDataset(torch.tensor(X_te_std),  torch.tensor(y_te).long(),  torch.tensor(sub_te).long())

    # Weighted sampler
    def make_weighted_sampler(dataset: TensorDataset):
        _Xb, yb, sb = dataset.tensors
        yb_np = yb.numpy()
        sb_np = sb.numpy()
        uniq_s, cnt_s = np.unique(sb_np, return_counts=True)
        map_s = {s:c for s,c in zip(uniq_s, cnt_s)}
        key = sb_np.astype(np.int64) * 10 + yb_np.astype(np.int64)
        uniq_k, cnt_k = np.unique(key, return_counts=True)
        map_k = {k:c for k,c in zip(uniq_k, cnt_k)}
        a, b = 0.8, 1.0
        w = []
        for s, y in zip(sb_np, yb_np):
            k = int(s)*10 + int(y)
            ws = float(map_s[int(s)])
            wk = float(map_k[k])
            w.append((ws ** (-a)) * (wk ** (-b)))
        w = np.array(w, dtype=np.float64)
        w = w / (w.mean() + 1e-12)
        sampler = WeightedRandomSampler(weights=torch.tensor(w, dtype=torch.double),
                                        num_samples=len(yb_np), replacement=True)
        return sampler

    if USE_WEIGHTED_SAMPLER:
        tr_sampler = make_weighted_sampler(tr_ds)
        tr_ld  = DataLoader(tr_ds, batch_size=BATCH_SIZE, sampler=tr_sampler, drop_last=False, worker_init_fn=seed_worker)
    else:
        tr_ld  = DataLoader(tr_ds, batch_size=BATCH_SIZE, shuffle=True,  drop_last=False, worker_init_fn=seed_worker)

    val_ld = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, drop_last=False, worker_init_fn=seed_worker)
    te_ld  = DataLoader(te_ds,  batch_size=BATCH_SIZE, shuffle=False, drop_last=False, worker_init_fn=seed_worker)

    # --- Medidores de tiempo/memoria por fase ---
    train_time_s = None
    train_mem_mb = None
    test_time_s  = None
    test_mem_mb  = None

    # Modelo
    if model_factory is None:
        model = EEGCNNTransformer(n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                                  n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                                  n_dw_blocks=2, capture_attn=True).to(device)
    else:
        model = model_factory()

    # Opt + Loss
    opt = torch.optim.AdamW(model.parameters(), lr=BASE_LR, weight_decay=1e-2)
    class_counts = np.bincount(y_tr, minlength=2).astype(np.float32)
    inv = class_counts.sum() / (2.0 * np.maximum(class_counts, 1.0))
    alpha = torch.tensor(inv, dtype=torch.float32, device=device)
    crit = FocalLoss(alpha=alpha, gamma=1.5, reduction='mean')

    # LR scheduler Warmup+Cosine
    from torch.optim.lr_scheduler import LambdaLR
    total_epochs = EPOCHS
    warmup_epochs = max(1, int(WARMUP_EPOCHS))
    min_factor = 0.1
    def lr_lambda(current_epoch):
        if current_epoch < warmup_epochs:
            return (current_epoch + 1) / warmup_epochs
        progress = (current_epoch - warmup_epochs) / max(1, (total_epochs - warmup_epochs))
        progress = min(1.0, max(0.0, progress))
        return min_factor + 0.5 * (1.0 - min_factor) * (1.0 + np.cos(np.pi * progress))
    scheduler = LambdaLR(opt, lr_lambda=lr_lambda)

    # EMA
    ema = ModelEMA(model, decay=EMA_DECAY, device=device) if USE_EMA else None

    # Entrenamiento global (con tracking de GAP)
    best_f1, best_state, wait = 0.0, None, 0
    hist = {"ep": [], "tr_loss": [], "tr_acc": [], "val_acc": [], "val_f1m": [], "val_loss": [], "lr": []}

    # NUEVO: métricas en la época de mejor valid F1
    best_val_acc = float('nan')
    best_val_f1  = float('nan')
    best_tr_acc  = float('nan')
    best_tr_f1   = float('nan')

    def evaluate_on(loader, use_ema=True, loss_fn=None):
        mdl = ema.ema if (ema is not None and use_ema) else model
        mdl.eval()
        preds, gts = [], []
        tot_loss, n_seen = 0.0, 0
        with torch.no_grad():
            for xb, yb, _sb in loader:
                xb = xb.to(device); yb = yb.to(device)
                logits = mdl(xb)
                if loss_fn is not None:
                    L = loss_fn(logits, yb).item()
                    tot_loss += L * len(yb); n_seen += len(yb)
                p = logits.argmax(dim=1).cpu().numpy()
                preds.append(p); gts.append(yb.cpu().numpy())
        preds = np.concatenate(preds); gts = np.concatenate(gts)
        acc = accuracy_score(gts, preds)
        f1m = f1_score(gts, preds, average='macro')
        vloss = (tot_loss / max(1,n_seen)) if n_seen>0 else float('nan')
        return acc, f1m, vloss, preds, gts

    # === NUEVO: iniciar cronómetro + memoria de ENTRENAMIENTO ===
    if torch.cuda.is_available():
        torch.cuda.reset_peak_memory_stats(device)
    t_train_start = time.time()

    for ep in range(1, EPOCHS+1):
        model.train()
        tr_loss, n_seen, tr_correct = 0.0, 0, 0
        for xb, yb, _sb in tr_ld:
            xb = xb.to(device); yb = yb.to(device)
            xb = augment_batch(xb)
            opt.zero_grad()
            logits = model(xb)
            loss = crit(logits, yb)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            opt.step()
            if ema is not None:
                ema.update(model)

            tr_loss += loss.item() * len(yb)
            n_seen += len(yb)
            tr_correct += (logits.argmax(1) == yb).sum().item()
        tr_loss /= max(1, n_seen)
        tr_acc = tr_correct / max(1, n_seen)

        acc, f1m, vloss, _, _ = evaluate_on(val_ld, use_ema=True, loss_fn=crit)

        hist["ep"].append(ep)
        hist["tr_loss"].append(tr_loss)
        hist["tr_acc"].append(tr_acc)
        hist["val_acc"].append(acc)
        hist["val_f1m"].append(f1m)
        hist["val_loss"].append(vloss)
        hist["lr"].append(scheduler.get_last_lr()[0])

        print(f"  Época {ep:3d} | train_loss={tr_loss:.4f} | train_acc={tr_acc:.4f} | val_loss={vloss:.4f} | val_acc={acc:.4f} | val_f1m={f1m:.4f} | LR={scheduler.get_last_lr()[0]:.6f}")

        improved = f1m > best_f1 + 1e-4
        if improved:
            best_f1 = f1m
            ref_model = ema.ema if ema is not None else model
            best_state = {k: v.detach().cpu() for k, v in ref_model.state_dict().items()}
            wait = 0

            # NUEVO: computar train en el mismo modelo/época p/ medir gap
            tr_acc_eval, tr_f1_eval, _, _, _ = evaluate_on(tr_ld, use_ema=True, loss_fn=None)
            best_tr_acc = tr_acc_eval
            best_tr_f1  = tr_f1_eval
            best_val_acc = acc
            best_val_f1  = f1m
        else:
            wait += 1

        scheduler.step()
        if wait >= PATIENCE:
            print(f"  Early stopping en época {ep} (mejor val_f1m={best_f1:.4f})")
            break

    # === cerrar cronómetro de ENTRENAMIENTO ===
    t_train_end = time.time()
    train_time_s = t_train_end - t_train_start
    if torch.cuda.is_available():
        train_mem_mb = torch.cuda.max_memory_allocated(device) / (1024.0 ** 2)

    # Restaurar mejor estado
    if best_state is not None:
        (ema.ema if (ema is not None) else model).load_state_dict(best_state)
        model.load_state_dict(best_state)
    for p in model.parameters():
        p.requires_grad_(True)

    if save_artifacts and fold_dir is not None:
        torch.save(best_state, fold_dir / f"model_global_fold{fold}_{run_tag}.pth")
        print(f"↳ Modelo global guardado en: {fold_dir / f'model_global_fold{fold}_{run_tag}.pth'}")

    # ---- Curvas (sólo si se guardan artefactos) ----
    if save_artifacts:
        # ONLY train_loss and val_loss, correct scale
        fig = plt.figure(figsize=(8, 4.5))
        ax1 = plt.gca()
        ax1.plot(hist["ep"], hist["tr_loss"], label="train_loss")
        ax1.plot(hist["ep"], hist["val_loss"], label="val_loss")
        ax1.set_xlabel("Epoch")
        ax1.set_ylabel("Loss")
        ax1.set_title(f"Fold {fold} — Training Curve (2 classes) [{run_tag}]")
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        out_png = fold_dir / f"training_curve_fold{fold}_{run_tag}.png"
        plt.tight_layout()
        plt.savefig(out_png, dpi=140)
        plt.close(fig)
        print(f"↳ Training curve saved: {out_png}")
        # Guardar historial de curvas (epoch, train_loss, val_loss) para curva global
        hist_path = fold_dir / f"history_fold{fold}_{run_tag}.npz"
        np.savez(
            hist_path,
            ep=np.array(hist["ep"], dtype=np.int32),
            tr_loss=np.array(hist["tr_loss"], dtype=np.float32),
            val_loss=np.array(hist["val_loss"], dtype=np.float32),
        )
        print(f"↳ History saved: {hist_path}")

    # ---- Evaluación final en TEST (global) ----
    eval_model = ema.ema if (ema is not None) else model
    eval_model.eval()

    sfreq_used = RESAMPLE_HZ
    if sfreq_used is None:
        sfreq_used = int(round(X_te_std.shape[-1] / (TMAX - TMIN)))

    # === NUEVO: cronómetro + memoria de TEST ===
    if torch.cuda.is_available():
        torch.cuda.reset_peak_memory_stats(device)
    t_test_start = time.time()

    if (not SW_ENABLE) or SW_MODE == 'none':
        preds, gts = [], []
        with torch.no_grad():
            for xb, yb, _sb in te_ld:
                xb = xb.to(device)
                p = eval_model(xb).argmax(dim=1).cpu().numpy()
                preds.append(p); gts.append(yb.numpy())
        preds = np.concatenate(preds); gts = np.concatenate(gts)
    elif SW_MODE in ('subwin', 'tta'):
        logits_tta = None
        logits_sw  = None
        if SW_MODE == 'subwin':
            logits_sw = subwindow_logits(eval_model, X_te_std, sfreq_used, SW_LEN, SW_STRIDE, device)
        elif SW_MODE == 'tta':
            logits_tta = time_shift_tta_logits(eval_model, X_te_std, sfreq_used, TTA_SHIFTS_S, device)
        logits = logits_tta if logits_tta is not None else logits_sw
        preds = logits.argmax(axis=1); gts = y_te
    else:
        raise ValueError(f"SW_MODE desconocido: {SW_MODE}")

    # === NUEVO: cerrar cronómetro de TEST ===
    t_test_end = time.time()
    test_time_s = t_test_end - t_test_start
    if torch.cuda.is_available():
        test_mem_mb = torch.cuda.max_memory_allocated(device) / (1024.0 ** 2)

    acc = accuracy_score(gts, preds)
    f1m = f1_score(gts, preds, average='macro')
    print(f"[Fold {fold}/5] Global acc={acc:.4f} | f1_macro={f1m:.4f}\n")
    print(classification_report(gts, preds, target_names=[c.replace('_',' ') for c in CLASS_NAMES], digits=4))
    print("Confusion matrix (rows=true, cols=pred):")
    cm = confusion_matrix(gts, preds, labels=[0,1])
    print(cm)

    # Métricas extendidas SIEMPRE (aunque no guardes artefactos)
    met = compute_metrics(gts, preds)
    print(
        f"precision_macro={met['precision_macro']:.4f} | "
        f"recall_macro={met['recall_macro']:.4f} | "
        f"specificity_macro={met['specificity_macro']:.4f} | "
        f"sensitivity_macro={met['sensitivity_macro']:.4f}"
    )

    if save_artifacts:
        cm_png = fold_dir / f"confusion_global_fold{fold}_{run_tag}.png"
        plot_confusion_with_text(
            cm=cm,
            class_names=CLASS_NAMES,
            title=f"Confusion — Fold {fold} Global (2 classes) [{run_tag}]",
            out_path=cm_png,
            cmap='Blues'
        )
        print(f"↳ Confusion matrix saved: {cm_png}")

    # =========================
    # INTERPRETABILIDAD
    # =========================
    if save_artifacts:
        try:
            model_to_probe = model
            model_to_probe.eval()

            # Minibatch de validación para ejemplos ilustrativos
            with torch.no_grad():
                xb_val, yb_val, _ = next(iter(val_ld))
            xb_val = xb_val[:8].to(device)

            # --- 1. Mapas de atención (Transformer) ---
            _ = model_to_probe(xb_val)
            attn_list = model_to_probe.get_last_attention_maps()
            if attn_list and len(attn_list) > 0:
                # attn_list[0]: [B, heads, T, T]
                A = attn_list[0][0].mean(dim=0).detach().cpu().numpy()
                plt.figure(figsize=(5, 4))
                plt.imshow(A, aspect="auto")
                plt.title(f"Attention (layer1, mean heads) [{run_tag}]")
                plt.colorbar()
                plt.tight_layout()
                attn_png = fold_dir / f"attention_map_fold{fold}_{run_tag}.png"
                plt.savefig(attn_png, dpi=140)
                plt.close()
                print(f"↳ Mapa de atención: {attn_png}")

            # Tomamos UN ejemplo para todos los mapas
            xb_ex = xb_val[:1]
            pred_cls = int(model_to_probe(xb_ex).argmax(1).item())

            # --- 2. Topomap de saliency por canal ---
            saliency_ch = channel_saliency(model_to_probe, xb_ex, target_class=pred_cls)

            sfreq_topo = sfreq_used if sfreq_used is not None else int(
                round(X_te_std.shape[-1] / (TMAX - TMIN))
            )
            info_topo = make_mne_info_8ch(sfreq_topo)

            topo_sal_png = fold_dir / f"topomap_saliency_fold{fold}_{run_tag}.png"
            plot_topomap_from_values(
                saliency_ch,
                info_topo,
                title=f"Channel saliency — Fold {fold} [{run_tag}]",
                out_path=topo_sal_png,
                cmap="Reds",
                cbar_label="Normalized saliency"
            )
            # Guardar saliency numérica por canal para promedio GLOBAL
            saliency_path = fold_dir / f"saliency_channels_fold{fold}_{run_tag}.npy"
            np.save(saliency_path, saliency_ch)

            # --- 3. Topomap de -log10(p) left vs right (TEST) ---
            pvals = compute_channel_pvalues_lr(X_te_std, y_te)
            pvals_log = -np.log10(pvals + 1e-12)

            topo_p_png = fold_dir / f"topomap_log10p_fold{fold}_{run_tag}.png"
            plot_topomap_from_values(
                pvals_log,
                info_topo,
                title=f"-log10(p) Left vs Right — Fold {fold} [{run_tag}]",
                out_path=topo_p_png,
                cmap="viridis",
                cbar_label="-log10(p)"
            )

            # Guardar valores numéricos por canal para promedio GLOBAL
            pvals_log_path = fold_dir / f"pvals_log_channels_fold{fold}_{run_tag}.npy"
            np.save(pvals_log_path, pvals_log)

            # --- 4. t-SNE de embeddings del modelo ---
            # Usar un subconjunto de TEST para t-SNE (máximo 1000 muestras)
            n_tsne_samples = min(1000, len(X_te_std))
            indices = rng.choice(len(X_te_std), n_tsne_samples, replace=False)
            X_tsne = X_te_std[indices]
            y_tsne = y_te[indices]

            tsne_result, cls_embeddings = compute_tsne_embeddings(
                model_to_probe, X_tsne, DEVICE, n_components=2, perplexity=30
            )
            
            tsne_png = fold_dir / f"tsne_cls_fold{fold}_{run_tag}.png"
            plot_tsne(
                tsne_result, 
                y_tsne,
                title=f"t-SNE CLS embeddings — Fold {fold} [{run_tag}]",
                out_path=tsne_png
            )
            
            # Guardar t-SNE resultados y embeddings CLS
            tsne_path = fold_dir / f"tsne_results_fold{fold}_{run_tag}.npy"
            np.save(tsne_path, tsne_result)
            
            cls_emb_path = fold_dir / f"cls_embeddings_fold{fold}_{run_tag}.npy"
            np.save(cls_emb_path, cls_embeddings)

        except Exception as e:
            print(f"[WARN] Interpretabilidad no generada: {e}")

    # =========================
    # Fine-tuning progresivo por sujeto (opcional)
    # =========================
    if do_ft:
        subjects = np.unique(sub_te)
        subj_to_idx = {s: np.where(sub_te == s)[0] for s in subjects}

        def make_ft_optimizer(model_ft):
            head_params = list(model_ft.head.parameters())
            backbone_params = [p for n,p in model_ft.named_parameters() if not n.startswith('head.')]
            return torch.optim.AdamW([
                {'params': backbone_params, 'lr': FT_LR_BACKBONE},
                {'params': head_params,     'lr': FT_LR_HEAD}
            ], weight_decay=FT_WD)

        def freeze_backbone(model_ft, freeze=True):
            for n,p in model_ft.named_parameters():
                if not n.startswith('head.'):
                    p.requires_grad_(not freeze)

        def ft_make_criterion_from_counts(counts):
            inv = counts.sum() / (2.0 * np.maximum(counts.astype(np.float32), 1.0))
            a = torch.tensor(inv, dtype=torch.float32, device=device)
            return FocalLoss(alpha=a, gamma=1.5, reduction='mean')

        def evaluate_tensor(model_t, X, y):
            model_t.eval()
            with torch.no_grad():
                xb = torch.tensor(X, dtype=torch.float32, device=device)
                p = model_t(xb).argmax(1).cpu().numpy()
            return accuracy_score(y, p), f1_score(y, p, average='macro'), p

        all_true, all_pred = [], []
        base_state = (ema.ema if (ema is not None) else model).state_dict()

        for s in subjects:
            idx = subj_to_idx[s]
            Xs, ys = X_te_std[idx], y_te[idx]
            if len(np.unique(ys)) < 2 or len(ys) < 20:
                eval_model2 = model_factory() if model_factory is not None else EEGCNNTransformer(
                    n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                    n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                    n_dw_blocks=2, capture_attn=False).to(device)
                eval_model2.load_state_dict(base_state)
                _, _, pred_s = evaluate_tensor(eval_model2, Xs, ys)
                all_true.append(ys); all_pred.append(pred_s)
                continue

            skf = StratifiedKFold(n_splits=FT_N_FOLDS, shuffle=True, random_state=RANDOM_STATE + fold + int(s))
            for tr_idx, te_idx in skf.split(Xs, ys):
                Xtr, ytr = Xs[tr_idx].copy(), ys[tr_idx].copy()
                Xte_s, yte_s = Xs[te_idx].copy(), ys[te_idx].copy()
                Xtr_std, Xte_std = standardize_per_channel(Xtr, Xte_s)

                ft_model = model_factory() if model_factory is not None else EEGCNNTransformer(
                    n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                    n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                    n_dw_blocks=2, capture_attn=False).to(device)
                ft_model.load_state_dict(base_state)

                opt_ft = make_ft_optimizer(ft_model)
                alpha_counts = np.bincount(ytr, minlength=2)
                ft_crit = ft_make_criterion_from_counts(alpha_counts)

                ds_tr = TensorDataset(torch.tensor(Xtr_std), torch.tensor(ytr).long())
                ds_te = TensorDataset(torch.tensor(Xte_std), torch.tensor(yte_s).long())
                ld_tr = DataLoader(ds_tr, batch_size=FT_BATCH, shuffle=True,  drop_last=False, worker_init_fn=seed_worker)
                ld_te = DataLoader(ds_te, batch_size=FT_BATCH, shuffle=False, drop_last=False, worker_init_fn=seed_worker)

                # freeze
                freeze_backbone(ft_model, True)
                best_f1_s, best_state_s, wait_s = -1, None, 0
                for _ep in range(1, FT_FREEZE_EPOCHS+1):
                    ft_model.train()
                    for xb, yb in ld_tr:
                        xb = xb.to(device); yb = yb.to(device)
                        xb = augment_batch_ft(xb)
                        opt_ft.zero_grad()
                        logits = ft_model(xb)
                        loss = ft_crit(logits, yb)
                        loss.backward()
                        torch.nn.utils.clip_grad_norm_(ft_model.parameters(), 1.0)
                        opt_ft.step()
                    acc_s, f1_s, _ = evaluate_tensor(ft_model, Xte_std, yte_s)
                    if f1_s > best_f1_s + 1e-4:
                        best_f1_s = f1_s
                        best_state_s = {k: v.detach().cpu() for k,v in ft_model.state_dict().items()}
                        wait_s = 0
                    else:
                        wait_s += 1
                    if wait_s >= FT_PATIENCE: break
                if best_state_s is not None:
                    ft_model.load_state_dict(best_state_s)

                # unfreeze
                freeze_backbone(ft_model, False)
                best_f1_s2, best_state_s2, wait_s2 = -1, None, 0
                for _ep in range(1, FT_UNFREEZE_EPOCHS+1):
                    ft_model.train()
                    for xb, yb in ld_tr:
                        xb = xb.to(device); yb = yb.to(device)
                        xb = augment_batch_ft(xb)
                        opt_ft.zero_grad()
                        logits = ft_model(xb)
                        loss = ft_crit(logits, yb)
                        loss.backward()
                        torch.nn.utils.clip_grad_norm_(ft_model.parameters(), 1.0)
                        opt_ft.step()
                    acc_s, f1_s, _ = evaluate_tensor(ft_model, Xte_std, yte_s)
                    if f1_s > best_f1_s2 + 1e-4:
                        best_f1_s2 = f1_s
                        best_state_s2 = {k: v.detach().cpu() for k,v in ft_model.state_dict().items()}
                        wait_s2 = 0
                    else:
                        wait_s2 += 1
                    if wait_s2 >= FT_PATIENCE: break
                if best_state_s2 is not None:
                    ft_model.load_state_dict(best_state_s2)

                _, _, pred_s = evaluate_tensor(ft_model, Xte_std, yte_s)
                all_true.append(yte_s); all_pred.append(pred_s)

        all_true = np.concatenate(all_true) if len(all_true) else np.array([], dtype=int)
        all_pred = np.concatenate(all_pred) if len(all_pred) else np.array([], dtype=int)
        if len(all_true) > 0:
            ft_acc = accuracy_score(all_true, all_pred)
            ft_f1m = f1_score(all_true, all_pred, average='macro')
            print(f"  Fine-tuning PROGRESIVO (por sujeto, {FT_N_FOLDS}-fold CV) acc={ft_acc:.4f} | f1_macro={ft_f1m:.4f}")
            if save_artifacts:
                cm_ft = confusion_matrix(all_true, all_pred, labels=[0,1])
                ft_png = fold_dir / f"confusion_ft_fold{fold}_{run_tag}.png"
                plot_confusion_with_text(
                    cm=cm_ft,
                    class_names=CLASS_NAMES,
                    title=f"Confusion — Fold {fold} FT (2 clases) [{run_tag}]",
                    out_path=ft_png,
                    cmap='Greens'
                )
                print(f"↳ Matriz de confusión FT guardada: {ft_png}")
        else:
            ft_acc = float('nan')
            print("  Fine-tuning PROGRESIVO: no se generaron splits válidos.")
    else:
        ft_acc = float('nan')
        print("  [LITE] FT desactivado (do_ft=False).")

    # =========================
    # Consumo (siempre calculado); guardar JSON sólo si save_artifacts
    # =========================
    cons = {}
    try:
        eval_model.eval()
        with torch.no_grad():
            xb_small, _, _ = next(iter(val_ld))
        xb_small = xb_small[:8].to(device)

        with torch.no_grad():
            z = eval_model.conv_t(xb_small)
            z = eval_model.proj(z).transpose(1, 2)  # (B, L, D)
            L, d = z.shape[1], z.shape[2]

        params_total = count_params(eval_model, trainable_only=False)
        params_train = count_params(model, trainable_only=True)
        flops = estimate_transformer_flops(L=L, d=d, n_heads=N_HEADS, n_layers=N_LAYERS)
        lat   = benchmark_latency(eval_model, xb_small, DEVICE)

        cons = dict(
            params_total=params_total,
            params_trainable=params_train,
            flops_est=int(flops),
            latency_s=float(lat),
            L=L, d=d,
            n_heads=N_HEADS, n_layers=N_LAYERS,
            run_tag=run_tag, fold=fold,
            # GAP val-train en la mejor época de valid F1
            val_best_acc=float(best_val_acc),
            val_best_f1=float(best_val_f1),
            tr_at_best_acc=float(best_tr_acc),
            tr_at_best_f1=float(best_tr_f1),
            # === NUEVO: métricas de tiempo/memoria por fase ===
            train_time_s=float(train_time_s) if train_time_s is not None else None,
            train_mem_mb=float(train_mem_mb) if train_mem_mb is not None else None,
            test_time_s=float(test_time_s) if test_time_s is not None else None,
            test_mem_mb=float(test_mem_mb) if test_mem_mb is not None else None,
        )

        if save_artifacts and (fold_dir is not None):
            cons_json = fold_dir / f"consumption_fold{fold}_{run_tag}.json"
            with open(cons_json, "w") as f:
                json.dump(cons, f, indent=2)

        print(f"[Consumo] params_total={params_total:,} | params_trainable={params_train:,} | "
              f"FLOPs~{flops/1e6:.1f}M | latency/batch={lat*1000:.2f} ms (B={xb_small.size(0)})")
    except Exception as e:
        print(f"[WARN] Métricas de consumo no generadas: {e}")

    return acc, f1m, ft_acc, cons, met

### Utilidades de Splits y Configuraciones Adicionales

Funciones auxiliares para manejar splits de datos y configuraciones.

In [12]:
# =========================
# Utilidades splits estratificados por sujeto
# =========================
def build_subject_label_map(subject_ids):
    """
    Construye mapa de etiquetas dominantes por sujeto para estratificación.
    """
    y_dom_list = []
    for sid in subject_ids:
        Xs, ys, _ = load_subject_epochs(sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS, DO_CAR, BP_LO, BP_HI)
        if len(ys) == 0:
            y_dom_list.append(-1)
            continue
        binc = np.bincount(ys, minlength=2)
        y_dom = int(np.argmax(binc))
        y_dom_list.append(y_dom)
    return np.array(y_dom_list, dtype=int)

### Funciones de Barrido de Hiperparámetros

Funciones para realizar búsqueda de hiperparámetros y entrenamiento completo.

In [13]:
# =========================
# Barrido (con carpetas y resumen)
# =========================
def run_arch_sweep(device, base_blocks=2, deltas=(-2,0,+2), head_set=(2,4,6), sweep_fold=1):
    """
    Realiza barrido de hiperparámetros (número de bloques y cabezas de atención).
    """
    base_dir = ensure_dir(Path("Barrido"))
    results = []
    run_id = 0
    for dlt in deltas:
        n_blocks = int(np.clip(base_blocks + dlt, 0, 4))
        for nh in head_set:
            if D_MODEL % nh != 0:
                print(f"[SKIP] n_heads={nh} incompatible con D_MODEL={D_MODEL} (no divisible).")
                continue
            run_id += 1
            run_tag = f"nb{n_blocks}_h{nh}"
            print(f"\n=== RUN {run_id} | n_dw_blocks={n_blocks} | n_heads={nh} ===")

            def make_model_for_fold():
                return EEGCNNTransformer(n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=nh,
                                         n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                                         n_dw_blocks=n_blocks, capture_attn=True).to(device)
            acc, f1m, ft_acc, cons, met = train_one_fold(
                fold=sweep_fold, device=device,
                model_factory=make_model_for_fold, run_tag=run_tag, out_dir=base_dir,
                save_artifacts=False, do_ft=False
            )
            results.append(dict(
                run=run_id,
                n_dw_blocks=n_blocks,
                n_heads=nh,
                fold=sweep_fold,
                acc=acc,
                f1_macro=f1m,
                ft_acc=ft_acc,
                # === MÉTRICAS EXTENDIDAS EN TEST ===
                precision_macro=met["precision_macro"],
                recall_macro=met["recall_macro"],
                specificity_macro=met["specificity_macro"],
                sensitivity_macro=met["sensitivity_macro"],
                # === CONSUMO ===
                tag=run_tag,
                params_total=cons.get("params_total"),
                params_trainable=cons.get("params_trainable"),
                flops_est=cons.get("flops_est"),
                latency_s=cons.get("latency_s"),
                L=cons.get("L"),
                d=cons.get("d"),
                enc_heads=cons.get("n_heads"),
                enc_layers=cons.get("n_layers"),
            ))

    if len(results):
        summary_path = Path("Barrido") / "summary_sweep.csv"
        fields = list(results[0].keys())
        save_csv_rows(summary_path, fields, results)
        print(f"↳ Resumen de barrido (con consumo) actualizado: {summary_path.resolve()}")
    return results

# =========================
# 5 Folds del ganador (con carpetas y resumen)
# =========================
def run_full_5fold_for_config(device, n_blocks, n_heads):
    """
    Entrena y evalúa el modelo en los 5 folds para una configuración específica.
    """
    tag = f"nb{n_blocks}_h{n_heads}"
    base_dir = ensure_dir(Path("ModeloW") / tag)

    acc_folds, f1_folds, ft_acc_folds = [], [], []
    rows = []

    # === NUEVO: listas para costo computacional por fold ===
    train_times, train_mems = [], []
    test_times,  test_mems  = [], []
    params_list, flops_list = [], []

    # Acumuladores para mapas GLOBALES
    saliency_list = []   # Para saliency por fold
    pvals_log_list = []  # Para -log10(p) por fold
    tsne_results_list = []  # Para t-SNE por fold (solo guardamos, no promediamos)

    # Acumulador para matriz de confusión global (suma de las de cada fold)
    global_cm = np.zeros((2, 2), dtype=int)
    
    def factory():
        return EEGCNNTransformer(n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=n_heads,
                                 n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                                 n_dw_blocks=n_blocks, capture_attn=True).to(device)

    for fold in range(1, 6):
        acc, f1m, ft_acc, cons, met = train_one_fold(
            fold, device, model_factory=factory, run_tag=tag, out_dir=base_dir
        )
        # Acumular matriz de confusión de este fold en la matriz GLOBAL
        if met is not None and "cm" in met:
            global_cm += np.array(met["cm"], dtype=int)
        acc_folds.append(acc); f1_folds.append(f1m)
        ft_acc_folds.append(ft_acc if ft_acc == ft_acc else np.nan)

        # === NUEVO: guardar consumo de este fold ===
        train_times.append(cons.get("train_time_s"))
        train_mems.append(cons.get("train_mem_mb"))
        test_times.append(cons.get("test_time_s"))
        test_mems.append(cons.get("test_mem_mb"))
        params_list.append(cons.get("params_total"))
        flops_list.append(cons.get("flops_est"))

        rows.append(dict(
            tag=tag, fold=fold, acc=acc, f1_macro=f1m, ft_acc=ft_acc,
            params_total=cons.get("params_total"),
            params_trainable=cons.get("params_trainable"),
            flops_est=cons.get("flops_est"),
            latency_s=cons.get("latency_s"),
            L=cons.get("L"), d=cons.get("d"),
            enc_heads=cons.get("n_heads"), enc_layers=cons.get("n_layers")
        ))

        # Cargar mapas de interpretabilidad de este fold para promedio GLOBAL
        try:
            fold_dir = base_dir / tag / f"fold{fold}"
            # Cargar saliency
            sal_path = fold_dir / f"saliency_channels_fold{fold}_{tag}.npy"
            if sal_path.exists():
                saliency_list.append(np.load(sal_path))
            
            # Cargar -log10(p)
            pvals_path = fold_dir / f"pvals_log_channels_fold{fold}_{tag}.npy"
            if pvals_path.exists():
                pvals_log_list.append(np.load(pvals_path))
                
            # Cargar t-SNE (solo para referencia, no promediamos)
            tsne_path = fold_dir / f"tsne_results_fold{fold}_{tag}.npy"
            if tsne_path.exists():
                tsne_results_list.append(np.load(tsne_path))
        except Exception as e:
            print(f"[WARN] No se pudieron cargar mapas para fold {fold}: {e}")

    acc_mean = float(np.nanmean(acc_folds))
    f1_mean  = float(np.nanmean(f1_folds))
    ft_mean  = float(np.nanmean(ft_acc_folds))
    delta_mean = float(ft_mean - acc_mean) if not np.isnan(ft_mean) else np.nan

    summary_path = base_dir / f"summary_{tag}.csv"
    save_csv_rows(summary_path, list(rows[0].keys()), rows)
    print(f"↳ Resumen 5-fold guardado: {summary_path.resolve()}")

    print("\n============================================================")
    print(f"RESULTADOS FINALES [{tag}] (2 clases: left/right)")
    print("============================================================")
    print("Global folds (ACC):", [f"{a:.4f}" for a in acc_folds])
    print("Global mean ACC:   ", f"{acc_mean:.4f}")
    print("F1 folds (MACRO):  ", [f"{f:.4f}" for f in f1_folds])
    print("F1 mean (MACRO):   ", f"{f1_mean:.4f}")
    if not np.isnan(ft_mean):
        print("FT folds (ACC):    ", [("nan" if np.isnan(a) else f"{a:.4f}") for a in ft_acc_folds])
        print("FT mean (ACC):     ", f"{ft_mean:.4f}")
        print("Δ(FT-Global) mean: ", f"{delta_mean:+.4f}")
    else:
        print("FT mean (ACC): N/A")

    # ========================================================
    # Tabla de métricas computacionales tipo paper
    # ========================================================

    tt = np.array([t for t in train_times if t is not None], dtype=float)
    tm = np.array([m for m in train_mems if m is not None], dtype=float)

    train_time_mean = float(tt.mean()) if len(tt) else float('nan')
    train_time_std  = float(tt.std(ddof=0)) if len(tt) else float('nan')
    train_mem_mean  = float(tm.mean()) if len(tm) else float('nan')
    train_mem_std   = float(tm.std(ddof=0)) if len(tm) else float('nan')

    # Para test tomamos el primer fold válido (single run)
    test_time = next((float(t) for t in test_times if t is not None), float('nan'))
    test_mem  = next((float(m) for m in test_mems if m is not None), float('nan'))

    # Modelo (mismo en todos los folds)
    params_total = next((p for p in params_list if p is not None), 0)
    flops_est    = next((f for f in flops_list if f is not None), 0)

    params_m = params_total / 1e6
    flops_g  = flops_est / 1e9

    print("\n================ Computational metrics ================")
    print(f"Model tag: {tag}")
    print("Phase        time (s)           memory (MB)        FLOPs (g)   params (m)")
    print("-----------  -----------------  -----------------  ----------  ----------")
    print(f"model specs  N/A                N/A                 {flops_g:8.3f}   {params_m:8.3f}")
    print(f"train        {train_time_mean:6.2f} ({train_time_std:5.2f})   {train_mem_mean:7.1f} ({train_mem_std:5.1f})     N/A        N/A")
    print(f"test         {test_time:6.2f}              {test_mem:7.1f}              N/A        N/A")

    comp_rows = [
        dict(phase="model_specs",
             time_mean_s=None, time_std_s=None,
             mem_mean_mb=None, mem_std_mb=None,
             flops_g=flops_g, params_m=params_m),
        dict(phase="train",
             time_mean_s=train_time_mean, time_std_s=train_time_std,
             mem_mean_mb=train_mem_mean, mem_std_mb=train_mem_std,
             flops_g=None, params_m=None),
        dict(phase="test",
             time_mean_s=test_time, time_std_s=None,
             mem_mean_mb=test_mem, mem_std_mb=None,
             flops_g=None, params_m=None),
    ]

    comp_path = base_dir / f"computational_{tag}.csv"
    save_csv_rows(comp_path,
                  fieldnames=list(comp_rows[0].keys()),
                  rows=comp_rows)
    print(f"↳ Tabla de métricas computacionales guardada en: {comp_path.resolve()}")
    
    # ========================================================
    # Matriz de confusión GLOBAL (5 folds concatenados)
    # ========================================================
    try:
        cm_global_png = base_dir / f"confusion_global_allfolds_{tag}.png"
        plot_confusion_with_text(
            cm=global_cm,
            class_names=CLASS_NAMES,
            title=f"Confusion — Global (5-fold) [{tag}]",
            out_path=cm_global_png,
            cmap='Blues'
        )
        print(f"↳ Matriz de confusión GLOBAL guardada: {cm_global_png}")
    except Exception as e:
        print(f"[WARN] No se pudo generar la matriz de confusión global: {e}")

    # ========================================================
    # Curva de LOSS GLOBAL (promedio sobre los 5 folds)
    # ========================================================
    try:
        histories = []
        for fold in range(1, 6):
            hist_path = base_dir / tag / f"fold{fold}" / f"history_fold{fold}_{tag}.npz"
            if not hist_path.exists():
                print(f"[WARN] History no encontrado para fold {fold}: {hist_path}")
                continue
            data = np.load(hist_path)
            hist = {
                "ep":      data["ep"],
                "tr_loss": data["tr_loss"],
                "val_loss": data["val_loss"],
            }
            histories.append(hist)

        if len(histories) > 0:
            # Longitud máxima de epochs entre los folds
            max_len = max(h["ep"].shape[0] for h in histories)
            epochs = np.arange(1, max_len + 1)

            mean_tr = []
            mean_val = []

            for i in range(max_len):
                tr_vals = [h["tr_loss"][i] for h in histories if h["tr_loss"].shape[0] > i]
                val_vals = [h["val_loss"][i] for h in histories if h["val_loss"].shape[0] > i]
                if len(tr_vals) == 0 or len(val_vals) == 0:
                    break  # ya no hay más epochs comunes
                mean_tr.append(np.mean(tr_vals))
                mean_val.append(np.mean(val_vals))

            epochs = epochs[:len(mean_tr)]
            mean_tr = np.array(mean_tr)
            mean_val = np.array(mean_val)

            fig = plt.figure(figsize=(8, 4.5))
            ax = plt.gca()
            ax.plot(epochs, mean_tr, label="Train loss (mean across folds)")
            ax.plot(epochs, mean_val, label="Val loss (mean across folds)")
            ax.set_xlabel("Epoch")
            ax.set_ylabel("Loss")
            ax.set_title(f"Global Training Curve — 5-fold mean [{tag}]")
            ax.legend()
            ax.grid(True, alpha=0.3)
            out_png = base_dir / f"training_curve_global_{tag}.png"
            plt.tight_layout()
            plt.savefig(out_png, dpi=140)
            plt.close(fig)
            print(f"↳ Curva de loss GLOBAL guardada: {out_png}")
        else:
            print("[WARN] No se encontraron historiales para construir curva global.")
    except Exception as e:
        print(f"[WARN] No se pudo generar la curva de loss global: {e}")

    # ========================================================
    # MAPAS GLOBALES (promedio de los 5 folds)
    # ========================================================
    try:
        # Crear info para topomaps
        info_topo_global = make_mne_info_8ch(sfreq=160.0)
        
        # 1. Topomap GLOBAL de saliency (promedio 5 folds)
        if len(saliency_list) > 0:
            sal_mean = np.mean(np.stack(saliency_list, axis=0), axis=0)   # (C,)
            topo_sal_global_png = base_dir / f"topomap_saliency_GLOBAL_{tag}.png"
            plot_topomap_from_values(
                sal_mean,
                info_topo_global,
                title=f"Channel saliency — GLOBAL (5 folds) [{tag}]",
                out_path=topo_sal_global_png,
                cmap="Reds",
                cbar_label="Normalized saliency"
            )
            print(f"↳ Topomap GLOBAL de saliency guardado: {topo_sal_global_png}")
        
        # 2. Topomap GLOBAL de -log10(p) (promedio 5 folds)
        if len(pvals_log_list) > 0:
            pvals_log_mean = np.mean(np.stack(pvals_log_list, axis=0), axis=0)   # (C,)
            topo_p_global_png = base_dir / f"topomap_log10p_GLOBAL_{tag}.png"
            plot_topomap_from_values(
                pvals_log_mean,
                info_topo_global,
                title=f"-log10(p) Left vs Right — GLOBAL (5 folds) [{tag}]",
                out_path=topo_p_global_png,
                cmap="viridis",
                cbar_label="-log10(p)"
            )
            print(f"↳ Topomap GLOBAL de -log10(p) guardado: {topo_p_global_png}")
        
        # 3. t-SNE GLOBAL usando CLS embeddings del último fold
        if len(tsne_results_list) > 0:
            # Usar el t-SNE del último fold como representativo
            tsne_representative = tsne_results_list[-1]
            
            # Cargar etiquetas correspondientes (del último fold)
            fold_dir = base_dir / tag / f"fold5"
            
            # Cargar y_te del fold 5 (si existe guardado)
            y_te_fold5_path = fold_dir / "y_te.npy"
            if y_te_fold5_path.exists():
                y_te_fold5 = np.load(y_te_fold5_path)
                # Tomar las mismas muestras que usamos para t-SNE
                rng = np.random.RandomState(RANDOM_STATE)
                n_tsne_samples = min(1000, len(y_te_fold5))
                indices = rng.choice(len(y_te_fold5), n_tsne_samples, replace=False)
                y_tsne_global = y_te_fold5[indices]
                
                tsne_global_png = base_dir / f"tsne_cls_GLOBAL_{tag}.png"
                plot_tsne(
                    tsne_representative,
                    y_tsne_global,
                    title=f"t-SNE CLS embeddings — GLOBAL (Fold 5 representative) [{tag}]",
                    out_path=tsne_global_png
                )
                print(f"↳ t-SNE GLOBAL (CLS embeddings) guardado: {tsne_global_png}")
                
    except Exception as e:
        print(f"[WARN] No se pudieron generar mapas GLOBALES: {e}")

### Main y Ejecución Principal

Función principal para ejecutar el pipeline completo.

In [14]:
# =========================
# MAIN
# =========================
if __name__ == "__main__":
    print("MODELO")
    # 1) Barrido con UN SOLO FOLD (fold=1). Ajusta deltas y head_set si quieres.
    # sweep = run_arch_sweep(DEVICE, base_blocks=2, deltas=(-2,0,+2), head_set=(2,4,6), sweep_fold=1)
    # sweep = run_arch_sweep(DEVICE, base_blocks=2, deltas=(-2,0,+2), head_set=(2,4,6), sweep_fold=2)
    # sweep = run_arch_sweep(DEVICE, base_blocks=2, deltas=(-2,0,+2), head_set=(2,4,6), sweep_fold=3)
    # sweep = run_arch_sweep(DEVICE, base_blocks=2, deltas=(-2,0,+2), head_set=(2,4,6), sweep_fold=4)
    # sweep = run_arch_sweep(DEVICE, base_blocks=2, deltas=(-2,0,+2), head_set=(2,4,6), sweep_fold=5)

    # 2) (Luego) con los GANADORES corre los 5 folds:
    run_full_5fold_for_config(DEVICE, n_blocks=2, n_heads=6)

    # 3) (Opcional) imprimir hiperparámetros contados:
    # hp = current_hparams()
    # print(f"\nHiperparámetros (n={len(hp)}):")
    # for k,v in hp.items():
    #     print(f" - {k}: {v}")

MODELO
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ModeloW/nb2_h6/nb2_h6/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:04<00:00, 15.64it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:01<00:00, 14.81it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 15.34it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1318 | train_acc=0.5213 | val_loss=0.1249 | val_acc=0.5000 | val_f1m=0.3443 | LR=0.000125
  Época   2 | train_loss=0.1224 | train_acc=0.5729 | val_loss=0.1187 | val_acc=0.6190 | val_f1m=0.6166 | LR=0.000250
  Época   3 | train_loss=0.1102 | train_acc=0.6546 | val_loss=0.1111 | val_acc=0.6508 | val_f1m=0.6402 | LR=0.000375
  Época   4 | train_loss=0.0962 | train_acc=0.7416 | val_loss=0.1044 | val_acc=0.7206 | val_f1m=0.7198 | LR=0.000500
  Época   5 | train_loss=0.0841 | train_acc=0.7839 | val_loss=0.0987 | val_acc=0.7444 | val_f1m=0.7439 | LR=0.000500
  Época   6 | train_loss=0.0764 | train_acc=0.8003 | val_loss=0.0960 | val_acc=0.7556 | val_f1m=0.7539 | LR=0.000500
  Época   7 | train_loss=0.0721 | train_acc=0.8124 | val_loss=0.0919 | val_acc=0.7635 | val_f1m=0.7627 | LR=0.000499
  Época   8 | train_loss=0.0662 | train_acc=0.8348 | val_loss=0.1250 | val_acc=0.7190 | val_f1m=0.7119



↳ t-SNE (CLS embeddings) guardado: ModeloW/nb2_h6/nb2_h6/fold1/tsne_cls_fold1_nb2_h6.png
  Fine-tuning PROGRESIVO (por sujeto, 4-fold CV) acc=0.8526 | f1_macro=0.8526
↳ Matriz de confusión FT guardada: ModeloW/nb2_h6/nb2_h6/fold1/confusion_ft_fold1_nb2_h6.png
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.12 ms (B=8)
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ModeloW/nb2_h6/nb2_h6/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.24it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 17.30it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 17.24it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1290 | train_acc=0.5267 | val_loss=0.1202 | val_acc=0.6032 | val_f1m=0.6029 | LR=0.000125
  Época   2 | train_loss=0.1214 | train_acc=0.5839 | val_loss=0.1159 | val_acc=0.6222 | val_f1m=0.6183 | LR=0.000250
  Época   3 | train_loss=0.1168 | train_acc=0.6276 | val_loss=0.1052 | val_acc=0.7095 | val_f1m=0.7095 | LR=0.000375
  Época   4 | train_loss=0.0987 | train_acc=0.7278 | val_loss=0.0958 | val_acc=0.7206 | val_f1m=0.7202 | LR=0.000500
  Época   5 | train_loss=0.0910 | train_acc=0.7594 | val_loss=0.0973 | val_acc=0.7175 | val_f1m=0.7028 | LR=0.000500
  Época   6 | train_loss=0.0808 | train_acc=0.7921 | val_loss=0.0942 | val_acc=0.7492 | val_f1m=0.7492 | LR=0.000500
  Época   7 | train_loss=0.0776 | train_acc=0.8042 | val_loss=0.0910 | val_acc=0.7476 | val_f1m=0.7417 | LR=0.000499
  Época   8 | train_loss=0.0771 | train_acc=0.8056 | val_loss=0.0823 | val_acc=0.7873 | val_f1m=0.7870



↳ t-SNE (CLS embeddings) guardado: ModeloW/nb2_h6/nb2_h6/fold2/tsne_cls_fold2_nb2_h6.png
  Fine-tuning PROGRESIVO (por sujeto, 4-fold CV) acc=0.8878 | f1_macro=0.8877
↳ Matriz de confusión FT guardada: ModeloW/nb2_h6/nb2_h6/fold2/confusion_ft_fold2_nb2_h6.png
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.12 ms (B=8)
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ModeloW/nb2_h6/nb2_h6/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 17.21it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.32it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.26it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1315 | train_acc=0.5174 | val_loss=0.1267 | val_acc=0.5079 | val_f1m=0.3685 | LR=0.000125
  Época   2 | train_loss=0.1251 | train_acc=0.5402 | val_loss=0.1324 | val_acc=0.5190 | val_f1m=0.3840 | LR=0.000250
  Época   3 | train_loss=0.1242 | train_acc=0.5547 | val_loss=0.1109 | val_acc=0.6810 | val_f1m=0.6809 | LR=0.000375
  Época   4 | train_loss=0.1067 | train_acc=0.6844 | val_loss=0.0947 | val_acc=0.7222 | val_f1m=0.7221 | LR=0.000500
  Época   5 | train_loss=0.0889 | train_acc=0.7633 | val_loss=0.0884 | val_acc=0.7651 | val_f1m=0.7621 | LR=0.000500
  Época   6 | train_loss=0.0765 | train_acc=0.7974 | val_loss=0.0827 | val_acc=0.8016 | val_f1m=0.8015 | LR=0.000500
  Época   7 | train_loss=0.0721 | train_acc=0.8259 | val_loss=0.0834 | val_acc=0.7937 | val_f1m=0.7936 | LR=0.000499
  Época   8 | train_loss=0.0738 | train_acc=0.8223 | val_loss=0.0810 | val_acc=0.8063 | val_f1m=0.8063



↳ t-SNE (CLS embeddings) guardado: ModeloW/nb2_h6/nb2_h6/fold3/tsne_cls_fold3_nb2_h6.png
  Fine-tuning PROGRESIVO (por sujeto, 4-fold CV) acc=0.8469 | f1_macro=0.8469
↳ Matriz de confusión FT guardada: ModeloW/nb2_h6/nb2_h6/fold3/confusion_ft_fold3_nb2_h6.png
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.13 ms (B=8)
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ModeloW/nb2_h6/nb2_h6/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:03<00:00, 17.07it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.04it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 16.95it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1308 | train_acc=0.4972 | val_loss=0.1204 | val_acc=0.5651 | val_f1m=0.5476 | LR=0.000125
  Época   2 | train_loss=0.1254 | train_acc=0.5501 | val_loss=0.1177 | val_acc=0.6079 | val_f1m=0.6056 | LR=0.000250
  Época   3 | train_loss=0.1174 | train_acc=0.6008 | val_loss=0.1263 | val_acc=0.6508 | val_f1m=0.6418 | LR=0.000375
  Época   4 | train_loss=0.1039 | train_acc=0.6992 | val_loss=0.1061 | val_acc=0.6889 | val_f1m=0.6888 | LR=0.000500
  Época   5 | train_loss=0.0898 | train_acc=0.7626 | val_loss=0.1008 | val_acc=0.7349 | val_f1m=0.7327 | LR=0.000500
  Época   6 | train_loss=0.0802 | train_acc=0.7805 | val_loss=0.0876 | val_acc=0.7635 | val_f1m=0.7622 | LR=0.000500
  Época   7 | train_loss=0.0810 | train_acc=0.7920 | val_loss=0.0990 | val_acc=0.7286 | val_f1m=0.7182 | LR=0.000499
  Época   8 | train_loss=0.0737 | train_acc=0.8099 | val_loss=0.0873 | val_acc=0.7762 | val_f1m=0.7749



↳ t-SNE (CLS embeddings) guardado: ModeloW/nb2_h6/nb2_h6/fold4/tsne_cls_fold4_nb2_h6.png
  Fine-tuning PROGRESIVO (por sujeto, 4-fold CV) acc=0.8738 | f1_macro=0.8738
↳ Matriz de confusión FT guardada: ModeloW/nb2_h6/nb2_h6/fold4/confusion_ft_fold4_nb2_h6.png
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.12 ms (B=8)
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ModeloW/nb2_h6/nb2_h6/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:03<00:00, 17.30it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.43it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.25it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1320 | train_acc=0.5035 | val_loss=0.1220 | val_acc=0.5095 | val_f1m=0.4079 | LR=0.000125
  Época   2 | train_loss=0.1241 | train_acc=0.5445 | val_loss=0.1177 | val_acc=0.5889 | val_f1m=0.5691 | LR=0.000250
  Época   3 | train_loss=0.1225 | train_acc=0.5795 | val_loss=0.1122 | val_acc=0.6619 | val_f1m=0.6540 | LR=0.000375
  Época   4 | train_loss=0.0968 | train_acc=0.7416 | val_loss=0.1055 | val_acc=0.7000 | val_f1m=0.6988 | LR=0.000500
  Época   5 | train_loss=0.0907 | train_acc=0.7560 | val_loss=0.1002 | val_acc=0.7127 | val_f1m=0.7107 | LR=0.000500
  Época   6 | train_loss=0.0747 | train_acc=0.8123 | val_loss=0.0961 | val_acc=0.7190 | val_f1m=0.7162 | LR=0.000500
  Época   7 | train_loss=0.0737 | train_acc=0.8200 | val_loss=0.1029 | val_acc=0.7286 | val_f1m=0.7273 | LR=0.000499
  Época   8 | train_loss=0.0708 | train_acc=0.8232 | val_loss=0.1032 | val_acc=0.7333 | val_f1m=0.7278



↳ t-SNE (CLS embeddings) guardado: ModeloW/nb2_h6/nb2_h6/fold5/tsne_cls_fold5_nb2_h6.png
  Fine-tuning PROGRESIVO (por sujeto, 4-fold CV) acc=0.8786 | f1_macro=0.8786
↳ Matriz de confusión FT guardada: ModeloW/nb2_h6/nb2_h6/fold5/confusion_ft_fold5_nb2_h6.png
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.12 ms (B=8)
↳ Resumen 5-fold guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ModeloW/nb2_h6/summary_nb2_h6.csv

RESULTADOS FINALES [nb2_h6] (2 clases: left/right)
Global folds (ACC): ['0.8107', '0.8356', '0.8084', '0.8119', '0.8464']
Global mean ACC:    0.8226
F1 folds (MACRO):   ['0.8107', '0.8353', '0.8084', '0.8098', '0.8464']
F1 mean (MACRO):    0.8221
FT folds (ACC):     ['0.8526', '0.8878', '0.8469', '0.8738', '0.8786']
FT mean (ACC):      0.8679
Δ(FT-Global) mean:  +0.0453

Model tag: nb2_h6
Phase        time (s)           memory (MB)        FLOPs (g)   params (m)
-----------  -----------------  -----------------  ----------

---
## ESTUDIO DE ABLACIÓN

Esta sección implementa un estudio de ablación sistemático para evaluar la contribución de cada componente del modelo híbrido CNN-Transformer.

### Variantes evaluadas:
1. **Baseline**: Modelo completo con todos los componentes
2. **Sin Depthwise Blocks**: Solo stem convolucional (n_dw_blocks=0)
3. **Sin Codificación Posicional**: Elimina positional encoding
4. **Sin Token CLS**: Usa average pooling en vez del token de clasificación
5. **Con BatchNorm**: Reemplaza GroupNorm por BatchNorm
6. **Con ReLU**: Reemplaza activaciones ELU/GELU por ReLU

In [15]:
# =========================
# VARIANTES DEL MODELO PARA ABLACIÓN
# =========================

# VARIANTE 1: Sin bloques depthwise (solo stem)
class EEGCNNTransformer_NoDepthwise(nn.Module):
    """Variante sin bloques depthwise separable"""
    def __init__(self, n_ch=8, n_cls=2, d_model=128, n_heads=4, n_layers=2,
                 p_drop=0.2, p_drop_encoder=0.3, capture_attn=False):
        super().__init__()
        self.capture_attn = capture_attn
        self._last_attn = None
        self.pos_encoding = None

        # Solo stem, sin bloques depthwise
        self.conv_t = nn.Sequential(
            nn.Conv1d(n_ch, 32, kernel_size=129, stride=2, padding=64, bias=False),
            make_gn(32),
            nn.ELU(),
            nn.Dropout(p=p_drop),
        )
        
        self.proj = nn.Conv1d(32, d_model, kernel_size=1, bias=False)
        self.dropout = nn.Dropout(p=p_drop_encoder)

        # Encoder Transformer
        enc = _CustomEncoderLayer(d_model=d_model, nhead=n_heads, dim_feedforward=2*d_model,
                                  dropout=0.1, batch_first=True, norm_first=False, return_attn=True)
        self.encoder = _CustomEncoder(enc, num_layers=n_layers)

        # Token CLS y head
        self.cls = nn.Parameter(torch.zeros(1, 1, d_model))
        nn.init.normal_(self.cls, std=0.02)
        self.head = nn.Sequential(nn.LayerNorm(d_model), nn.Linear(d_model, n_cls))

    def _positional_encoding(self, L, d):
        pos = torch.arange(0, L, dtype=torch.float32).unsqueeze(1)
        i   = torch.arange(0, d, dtype=torch.float32).unsqueeze(0)
        angle = pos / torch.pow(10000, (2 * (i//2)) / d)
        pe = torch.zeros(L, d, dtype=torch.float32)
        pe[:, 0::2] = torch.sin(angle[:, 0::2])
        pe[:, 1::2] = torch.cos(angle[:, 1::2])
        return pe

    def forward(self, x):
        z = self.conv_t(x)
        z = self.proj(z)
        z = self.dropout(z)
        z = z.transpose(1, 2)
        
        # Positional encoding
        B, L, D = z.shape
        if (self.pos_encoding is None) or (self.pos_encoding.shape[0] != L) or (self.pos_encoding.shape[1] != D):
            self.pos_encoding = self._positional_encoding(L, D).to(z.device)
        z = z + self.pos_encoding[None, :, :]
        
        # CLS token
        cls_tok = self.cls.expand(B, -1, -1)
        z = torch.cat([cls_tok, z], dim=1)

        z, attn_list = self.encoder(z)
        if self.capture_attn:
            self._last_attn = attn_list

        cls = z[:, 0, :]
        return self.head(cls)

    def get_last_attention_maps(self):
        return self._last_attn


# VARIANTE 2: Sin codificación posicional
class EEGCNNTransformer_NoPosEncoding(nn.Module):
    """Variante sin positional encoding"""
    def __init__(self, n_ch=8, n_cls=2, d_model=128, n_heads=4, n_layers=2,
                 p_drop=0.2, p_drop_encoder=0.3, n_dw_blocks=2,
                 k_list=(31,15,7), s_list=(2,2,2), p_list=(15,7,3), capture_attn=False):
        super().__init__()
        self.capture_attn = capture_attn
        self._last_attn = None

        # Stem convolucional
        stem = [
            nn.Conv1d(n_ch, 32, kernel_size=129, stride=2, padding=64, bias=False),
            make_gn(32),
            nn.ELU(),
            nn.Dropout(p=p_drop),
        ]
        
        # Bloques depthwise
        blocks = []
        in_c, out_cs = 32, [64, 128, 256, 256, 256]
        n_dw_blocks = int(np.clip(n_dw_blocks, 0, len(out_cs)))
        for i in range(n_dw_blocks):
            k = k_list[i] if i < len(k_list) else 7
            s = s_list[i] if i < len(s_list) else 2
            p = p_list[i] if i < len(p_list) else k//2
            blocks.append(DepthwiseSeparableConv(in_c, out_cs[i], k=k, s=s, p=p, p_drop=p_drop))
            in_c = out_cs[i]

        self.conv_t = nn.Sequential(*stem, *blocks)
        self.proj = nn.Conv1d(in_c if n_dw_blocks>0 else 32, d_model, kernel_size=1, bias=False)
        self.dropout = nn.Dropout(p=p_drop_encoder)

        # Encoder Transformer
        enc = _CustomEncoderLayer(d_model=d_model, nhead=n_heads, dim_feedforward=2*d_model,
                                  dropout=0.1, batch_first=True, norm_first=False, return_attn=True)
        self.encoder = _CustomEncoder(enc, num_layers=n_layers)

        # Token CLS y head
        self.cls = nn.Parameter(torch.zeros(1, 1, d_model))
        nn.init.normal_(self.cls, std=0.02)
        self.head = nn.Sequential(nn.LayerNorm(d_model), nn.Linear(d_model, n_cls))

    def forward(self, x):
        z = self.conv_t(x)
        z = self.proj(z)
        z = self.dropout(z)
        z = z.transpose(1, 2)
        
        # SIN positional encoding
        
        # CLS token
        B = z.shape[0]
        cls_tok = self.cls.expand(B, -1, -1)
        z = torch.cat([cls_tok, z], dim=1)

        z, attn_list = self.encoder(z)
        if self.capture_attn:
            self._last_attn = attn_list

        cls = z[:, 0, :]
        return self.head(cls)

    def get_last_attention_maps(self):
        return self._last_attn


# VARIANTE 3: Sin token CLS (average pooling)
class EEGCNNTransformer_NoCLS(nn.Module):
    """Variante sin token CLS, usa average pooling"""
    def __init__(self, n_ch=8, n_cls=2, d_model=128, n_heads=4, n_layers=2,
                 p_drop=0.2, p_drop_encoder=0.3, n_dw_blocks=2,
                 k_list=(31,15,7), s_list=(2,2,2), p_list=(15,7,3), capture_attn=False):
        super().__init__()
        self.capture_attn = capture_attn
        self._last_attn = None
        self.pos_encoding = None

        # Stem convolucional
        stem = [
            nn.Conv1d(n_ch, 32, kernel_size=129, stride=2, padding=64, bias=False),
            make_gn(32),
            nn.ELU(),
            nn.Dropout(p=p_drop),
        ]
        
        # Bloques depthwise
        blocks = []
        in_c, out_cs = 32, [64, 128, 256, 256, 256]
        n_dw_blocks = int(np.clip(n_dw_blocks, 0, len(out_cs)))
        for i in range(n_dw_blocks):
            k = k_list[i] if i < len(k_list) else 7
            s = s_list[i] if i < len(s_list) else 2
            p = p_list[i] if i < len(p_list) else k//2
            blocks.append(DepthwiseSeparableConv(in_c, out_cs[i], k=k, s=s, p=p, p_drop=p_drop))
            in_c = out_cs[i]

        self.conv_t = nn.Sequential(*stem, *blocks)
        self.proj = nn.Conv1d(in_c if n_dw_blocks>0 else 32, d_model, kernel_size=1, bias=False)
        self.dropout = nn.Dropout(p=p_drop_encoder)

        # Encoder Transformer
        enc = _CustomEncoderLayer(d_model=d_model, nhead=n_heads, dim_feedforward=2*d_model,
                                  dropout=0.1, batch_first=True, norm_first=False, return_attn=True)
        self.encoder = _CustomEncoder(enc, num_layers=n_layers)

        # Head (sin CLS token)
        self.head = nn.Sequential(nn.LayerNorm(d_model), nn.Linear(d_model, n_cls))

    def _positional_encoding(self, L, d):
        pos = torch.arange(0, L, dtype=torch.float32).unsqueeze(1)
        i   = torch.arange(0, d, dtype=torch.float32).unsqueeze(0)
        angle = pos / torch.pow(10000, (2 * (i//2)) / d)
        pe = torch.zeros(L, d, dtype=torch.float32)
        pe[:, 0::2] = torch.sin(angle[:, 0::2])
        pe[:, 1::2] = torch.cos(angle[:, 1::2])
        return pe

    def forward(self, x):
        z = self.conv_t(x)
        z = self.proj(z)
        z = self.dropout(z)
        z = z.transpose(1, 2)
        
        # Positional encoding
        B, L, D = z.shape
        if (self.pos_encoding is None) or (self.pos_encoding.shape[0] != L) or (self.pos_encoding.shape[1] != D):
            self.pos_encoding = self._positional_encoding(L, D).to(z.device)
        z = z + self.pos_encoding[None, :, :]
        
        # SIN CLS token

        z, attn_list = self.encoder(z)
        if self.capture_attn:
            self._last_attn = attn_list

        # Average pooling en lugar de CLS
        pooled = z.mean(dim=1)  # (B, D)
        return self.head(pooled)

    def get_last_attention_maps(self):
        return self._last_attn


# VARIANTE 4: Con BatchNorm en vez de GroupNorm
def make_bn(num_channels):
    """Crea capa BatchNorm1d"""
    return nn.BatchNorm1d(num_channels)

class DepthwiseSeparableConv_BN(nn.Module):
    """Convolución depthwise separable con BatchNorm"""
    def __init__(self, in_ch, out_ch, k, s=1, p=0, p_drop=0.2):
        super().__init__()
        self.dw = nn.Conv1d(in_ch, in_ch, kernel_size=k, stride=s, padding=p, groups=in_ch, bias=False)
        self.pw = nn.Conv1d(in_ch, out_ch, kernel_size=1, bias=False)
        self.norm = make_bn(out_ch)
        self.act = nn.ELU()
        self.dropout = nn.Dropout(p=p_drop)

    def forward(self, x):
        x = self.dw(x); x = self.pw(x); x = self.norm(x)
        x = self.act(x); x = self.dropout(x)
        return x

class EEGCNNTransformer_BatchNorm(nn.Module):
    """Variante con BatchNorm en vez de GroupNorm"""
    def __init__(self, n_ch=8, n_cls=2, d_model=128, n_heads=4, n_layers=2,
                 p_drop=0.2, p_drop_encoder=0.3, n_dw_blocks=2,
                 k_list=(31,15,7), s_list=(2,2,2), p_list=(15,7,3), capture_attn=False):
        super().__init__()
        self.capture_attn = capture_attn
        self._last_attn = None
        self.pos_encoding = None

        # Stem con BatchNorm
        stem = [
            nn.Conv1d(n_ch, 32, kernel_size=129, stride=2, padding=64, bias=False),
            make_bn(32),
            nn.ELU(),
            nn.Dropout(p=p_drop),
        ]
        
        # Bloques depthwise con BatchNorm
        blocks = []
        in_c, out_cs = 32, [64, 128, 256, 256, 256]
        n_dw_blocks = int(np.clip(n_dw_blocks, 0, len(out_cs)))
        for i in range(n_dw_blocks):
            k = k_list[i] if i < len(k_list) else 7
            s = s_list[i] if i < len(s_list) else 2
            p = p_list[i] if i < len(p_list) else k//2
            blocks.append(DepthwiseSeparableConv_BN(in_c, out_cs[i], k=k, s=s, p=p, p_drop=p_drop))
            in_c = out_cs[i]

        self.conv_t = nn.Sequential(*stem, *blocks)
        self.proj = nn.Conv1d(in_c if n_dw_blocks>0 else 32, d_model, kernel_size=1, bias=False)
        self.dropout = nn.Dropout(p=p_drop_encoder)

        # Encoder Transformer
        enc = _CustomEncoderLayer(d_model=d_model, nhead=n_heads, dim_feedforward=2*d_model,
                                  dropout=0.1, batch_first=True, norm_first=False, return_attn=True)
        self.encoder = _CustomEncoder(enc, num_layers=n_layers)

        # Token CLS y head
        self.cls = nn.Parameter(torch.zeros(1, 1, d_model))
        nn.init.normal_(self.cls, std=0.02)
        self.head = nn.Sequential(nn.LayerNorm(d_model), nn.Linear(d_model, n_cls))

    def _positional_encoding(self, L, d):
        pos = torch.arange(0, L, dtype=torch.float32).unsqueeze(1)
        i   = torch.arange(0, d, dtype=torch.float32).unsqueeze(0)
        angle = pos / torch.pow(10000, (2 * (i//2)) / d)
        pe = torch.zeros(L, d, dtype=torch.float32)
        pe[:, 0::2] = torch.sin(angle[:, 0::2])
        pe[:, 1::2] = torch.cos(angle[:, 1::2])
        return pe

    def forward(self, x):
        z = self.conv_t(x)
        z = self.proj(z)
        z = self.dropout(z)
        z = z.transpose(1, 2)
        
        # Positional encoding
        B, L, D = z.shape
        if (self.pos_encoding is None) or (self.pos_encoding.shape[0] != L) or (self.pos_encoding.shape[1] != D):
            self.pos_encoding = self._positional_encoding(L, D).to(z.device)
        z = z + self.pos_encoding[None, :, :]
        
        # CLS token
        cls_tok = self.cls.expand(B, -1, -1)
        z = torch.cat([cls_tok, z], dim=1)

        z, attn_list = self.encoder(z)
        if self.capture_attn:
            self._last_attn = attn_list

        cls = z[:, 0, :]
        return self.head(cls)

    def get_last_attention_maps(self):
        return self._last_attn


# VARIANTE 5: Con ReLU en vez de ELU/GELU
class DepthwiseSeparableConv_ReLU(nn.Module):
    """Convolución depthwise separable con ReLU"""
    def __init__(self, in_ch, out_ch, k, s=1, p=0, p_drop=0.2):
        super().__init__()
        self.dw = nn.Conv1d(in_ch, in_ch, kernel_size=k, stride=s, padding=p, groups=in_ch, bias=False)
        self.pw = nn.Conv1d(in_ch, out_ch, kernel_size=1, bias=False)
        self.norm = make_gn(out_ch)
        self.act = nn.ReLU()  # ReLU en vez de ELU
        self.dropout = nn.Dropout(p=p_drop)

    def forward(self, x):
        x = self.dw(x); x = self.pw(x); x = self.norm(x)
        x = self.act(x); x = self.dropout(x)
        return x

class _CustomEncoderLayer_ReLU(nn.Module):
    """Encoder Transformer con ReLU en FFN"""
    def __init__(self, d_model, nhead, dim_feedforward, dropout, batch_first, norm_first, return_attn=True):
        super().__init__()
        self.return_attn = return_attn
        self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first)
        self.linear1 = nn.Linear(d_model, dim_feedforward)
        self.dropout = nn.Dropout(dropout)
        self.linear2 = nn.Linear(dim_feedforward, d_model)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)
        self.activation = nn.ReLU()  # ReLU en vez de GELU
        self.norm_first = norm_first

    def forward(self, src):
        sa_out, attn_weights = self.self_attn(src, src, src, need_weights=True, average_attn_weights=False)
        src = self.norm1(src + self.dropout1(sa_out))
        ff = self.linear2(self.dropout(self.activation(self.linear1(src))))
        src = self.norm2(src + self.dropout2(ff))
        return src, attn_weights

class EEGCNNTransformer_ReLU(nn.Module):
    """Variante con ReLU en vez de ELU/GELU"""
    def __init__(self, n_ch=8, n_cls=2, d_model=128, n_heads=4, n_layers=2,
                 p_drop=0.2, p_drop_encoder=0.3, n_dw_blocks=2,
                 k_list=(31,15,7), s_list=(2,2,2), p_list=(15,7,3), capture_attn=False):
        super().__init__()
        self.capture_attn = capture_attn
        self._last_attn = None
        self.pos_encoding = None

        # Stem con ReLU
        stem = [
            nn.Conv1d(n_ch, 32, kernel_size=129, stride=2, padding=64, bias=False),
            make_gn(32),
            nn.ReLU(),  # ReLU en vez de ELU
            nn.Dropout(p=p_drop),
        ]
        
        # Bloques depthwise con ReLU
        blocks = []
        in_c, out_cs = 32, [64, 128, 256, 256, 256]
        n_dw_blocks = int(np.clip(n_dw_blocks, 0, len(out_cs)))
        for i in range(n_dw_blocks):
            k = k_list[i] if i < len(k_list) else 7
            s = s_list[i] if i < len(s_list) else 2
            p = p_list[i] if i < len(p_list) else k//2
            blocks.append(DepthwiseSeparableConv_ReLU(in_c, out_cs[i], k=k, s=s, p=p, p_drop=p_drop))
            in_c = out_cs[i]

        self.conv_t = nn.Sequential(*stem, *blocks)
        self.proj = nn.Conv1d(in_c if n_dw_blocks>0 else 32, d_model, kernel_size=1, bias=False)
        self.dropout = nn.Dropout(p=p_drop_encoder)

        # Encoder Transformer con ReLU
        enc = _CustomEncoderLayer_ReLU(d_model=d_model, nhead=n_heads, dim_feedforward=2*d_model,
                                       dropout=0.1, batch_first=True, norm_first=False, return_attn=True)
        self.encoder = _CustomEncoder(enc, num_layers=n_layers)

        # Token CLS y head
        self.cls = nn.Parameter(torch.zeros(1, 1, d_model))
        nn.init.normal_(self.cls, std=0.02)
        self.head = nn.Sequential(nn.LayerNorm(d_model), nn.Linear(d_model, n_cls))

    def _positional_encoding(self, L, d):
        pos = torch.arange(0, L, dtype=torch.float32).unsqueeze(1)
        i   = torch.arange(0, d, dtype=torch.float32).unsqueeze(0)
        angle = pos / torch.pow(10000, (2 * (i//2)) / d)
        pe = torch.zeros(L, d, dtype=torch.float32)
        pe[:, 0::2] = torch.sin(angle[:, 0::2])
        pe[:, 1::2] = torch.cos(angle[:, 1::2])
        return pe

    def forward(self, x):
        z = self.conv_t(x)
        z = self.proj(z)
        z = self.dropout(z)
        z = z.transpose(1, 2)
        
        # Positional encoding
        B, L, D = z.shape
        if (self.pos_encoding is None) or (self.pos_encoding.shape[0] != L) or (self.pos_encoding.shape[1] != D):
            self.pos_encoding = self._positional_encoding(L, D).to(z.device)
        z = z + self.pos_encoding[None, :, :]
        
        # CLS token
        cls_tok = self.cls.expand(B, -1, -1)
        z = torch.cat([cls_tok, z], dim=1)

        z, attn_list = self.encoder(z)
        if self.capture_attn:
            self._last_attn = attn_list

        cls = z[:, 0, :]
        return self.head(cls)

    def get_last_attention_maps(self):
        return self._last_attn

print("✓ Variantes del modelo definidas para estudio de ablación")

✓ Variantes del modelo definidas para estudio de ablación


### Función de Entrenamiento del Estudio de Ablación

Esta función entrena cada variante del modelo y recopila métricas para comparación.

In [16]:
# =========================
# ESTUDIO DE ABLACIÓN - FUNCIÓN PRINCIPAL
# =========================

def run_ablation_study(device, fold_list=[1], base_dir=None):
    """
    Ejecuta estudio de ablación entrenando todas las variantes del modelo.
    
    Args:
        device: Dispositivo para entrenamiento (cuda/cpu)
        fold_list: Lista de folds a evaluar (default: [1])
        base_dir: Directorio base para guardar resultados
    
    Returns:
        DataFrame con resultados comparativos
    """
    if base_dir is None:
        base_dir = PROJ / 'models' / '04_hybrid' / 'ablation_study'
    else:
        base_dir = Path(base_dir)
    
    ensure_dir(base_dir)
    
    # Definir variantes del modelo
    variants = {
        'baseline': {
            'name': 'Baseline (Completo)',
            'factory': lambda: EEGCNNTransformer(
                n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS, 
                n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                n_dw_blocks=2, capture_attn=False
            ),
            'description': 'Modelo completo con todos los componentes'
        },
        'no_depthwise': {
            'name': 'Sin Depthwise Blocks',
            'factory': lambda: EEGCNNTransformer_NoDepthwise(
                n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                capture_attn=False
            ),
            'description': 'Solo stem convolucional, sin bloques depthwise'
        },
        'no_posenc': {
            'name': 'Sin Codificación Posicional',
            'factory': lambda: EEGCNNTransformer_NoPosEncoding(
                n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                n_dw_blocks=2, capture_attn=False
            ),
            'description': 'Sin positional encoding en Transformer'
        },
        'no_cls': {
            'name': 'Sin Token CLS',
            'factory': lambda: EEGCNNTransformer_NoCLS(
                n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                n_dw_blocks=2, capture_attn=False
            ),
            'description': 'Usa average pooling en vez de token CLS'
        },
        'batchnorm': {
            'name': 'Con BatchNorm',
            'factory': lambda: EEGCNNTransformer_BatchNorm(
                n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                n_dw_blocks=2, capture_attn=False
            ),
            'description': 'BatchNorm en vez de GroupNorm'
        },
        'relu': {
            'name': 'Con ReLU',
            'factory': lambda: EEGCNNTransformer_ReLU(
                n_ch=8, n_cls=2, d_model=D_MODEL, n_heads=N_HEADS,
                n_layers=N_LAYERS, p_drop=P_DROP, p_drop_encoder=P_DROP_ENCODER,
                n_dw_blocks=2, capture_attn=False
            ),
            'description': 'ReLU en vez de ELU/GELU'
        }
    }
    
    # Almacenar resultados
    results = []
    
    print("="*80)
    print("INICIANDO ESTUDIO DE ABLACIÓN")
    print("="*80)
    print(f"Variantes a evaluar: {len(variants)}")
    print(f"Folds: {fold_list}")
    print(f"Directorio de salida: {base_dir}")
    print("="*80)
    
    # Entrenar cada variante
    for variant_key, variant_info in variants.items():
        print(f"\n{'='*80}")
        print(f"VARIANTE: {variant_info['name']}")
        print(f"Descripción: {variant_info['description']}")
        print(f"{'='*80}")
        
        variant_dir = base_dir / variant_key
        ensure_dir(variant_dir)
        
        # Resultados de esta variante
        variant_results = {
            'variant': variant_key,
            'name': variant_info['name'],
            'description': variant_info['description']
        }
        
        # Métricas acumuladas por fold
        fold_accs = []
        fold_f1s = []
        fold_params = []
        
        for fold in fold_list:
            print(f"\n--- Fold {fold} ---")
            seed_everything(RANDOM_STATE + fold)
            
            try:
                # Cargar datos
                train_subs, test_subs = load_fold_subjects(FOLDS_JSON, fold)
                print(f"Train: {len(train_subs)} sujetos, Test: {len(test_subs)} sujetos")
                
                # Cargar épocas
                X_train_list, y_train_list = [], []
                for sid in tqdm(train_subs, desc="Cargando train"):
                    X, y, _ = load_subject_epochs(
                        sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS, 
                        DO_CAR, BP_LO, BP_HI
                    )
                    if X.shape[0] > 0:
                        X_train_list.append(X)
                        y_train_list.append(y)
                
                X_test_list, y_test_list = [], []
                for sid in tqdm(test_subs, desc="Cargando test"):
                    X, y, _ = load_subject_epochs(
                        sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS,
                        DO_CAR, BP_LO, BP_HI
                    )
                    if X.shape[0] > 0:
                        X_test_list.append(X)
                        y_test_list.append(y)
                
                X_train = np.concatenate(X_train_list, axis=0)
                y_train = np.concatenate(y_train_list, axis=0)
                X_test = np.concatenate(X_test_list, axis=0)
                y_test = np.concatenate(y_test_list, axis=0)
                
                # Estandarizar
                X_train, X_test = standardize_per_channel(X_train, X_test)
                
                print(f"Train: {X_train.shape}, Test: {X_test.shape}")
                
                # Crear modelo
                model = variant_info['factory']().to(device)
                n_params = count_params(model)
                fold_params.append(n_params)
                print(f"Parámetros: {n_params:,}")
                
                # Preparar DataLoader
                train_dataset = TensorDataset(
                    torch.tensor(X_train, dtype=torch.float32),
                    torch.tensor(y_train, dtype=torch.long)
                )
                
                if USE_WEIGHTED_SAMPLER:
                    class_counts = np.bincount(y_train)
                    class_weights = 1.0 / (class_counts + 1e-6)
                    sample_weights = class_weights[y_train]
                    sampler = WeightedRandomSampler(
                        weights=sample_weights,
                        num_samples=len(sample_weights),
                        replacement=True
                    )
                    train_loader = DataLoader(
                        train_dataset, batch_size=BATCH_SIZE,
                        sampler=sampler, worker_init_fn=seed_worker,
                        generator=torch.Generator().manual_seed(RANDOM_STATE)
                    )
                else:
                    train_loader = DataLoader(
                        train_dataset, batch_size=BATCH_SIZE, shuffle=True,
                        worker_init_fn=seed_worker,
                        generator=torch.Generator().manual_seed(RANDOM_STATE)
                    )
                
                # Pérdida y optimizador
                alpha = torch.tensor([1.0, 1.0], device=device)
                criterion = FocalLoss(alpha=alpha, gamma=1.5)
                optimizer = torch.optim.AdamW(model.parameters(), lr=BASE_LR, weight_decay=1e-4)
                scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
                    optimizer, T_max=EPOCHS, eta_min=1e-6
                )
                
                # EMA
                ema = ModelEMA(model, decay=EMA_DECAY, device=device) if USE_EMA else None
                
                # Entrenamiento
                best_val_acc = 0.0
                patience_counter = 0
                
                for epoch in range(EPOCHS):
                    model.train()
                    train_loss = 0.0
                    
                    for xb, yb in train_loader:
                        xb, yb = xb.to(device), yb.to(device)
                        
                        # Augmentación
                        if epoch > WARMUP_EPOCHS:
                            xb = augment_batch(xb)
                        
                        optimizer.zero_grad()
                        logits = model(xb)
                        loss = criterion(logits, yb)
                        loss.backward()
                        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
                        optimizer.step()
                        
                        if ema is not None:
                            ema.update(model)
                        
                        train_loss += loss.item()
                    
                    scheduler.step()
                    
                    # Validación en test set (simplificado)
                    model.eval()
                    with torch.no_grad():
                        X_test_t = torch.tensor(X_test, dtype=torch.float32, device=device)
                        logits = model(X_test_t)
                        y_pred = logits.argmax(dim=1).cpu().numpy()
                        val_acc = accuracy_score(y_test, y_pred)
                    
                    if (epoch + 1) % 10 == 0:
                        print(f"Epoch {epoch+1}/{EPOCHS} - Loss: {train_loss/len(train_loader):.4f}, Val Acc: {val_acc:.4f}")
                    
                    # Early stopping
                    if val_acc > best_val_acc:
                        best_val_acc = val_acc
                        patience_counter = 0
                        # Guardar mejor modelo
                        torch.save(model.state_dict(), variant_dir / f"best_fold{fold}.pt")
                    else:
                        patience_counter += 1
                        if patience_counter >= PATIENCE:
                            print(f"Early stopping en epoch {epoch+1}")
                            break
                
                # Evaluación final
                model.load_state_dict(torch.load(variant_dir / f"best_fold{fold}.pt"))
                model.eval()
                
                with torch.no_grad():
                    X_test_t = torch.tensor(X_test, dtype=torch.float32, device=device)
                    logits = model(X_test_t)
                    y_pred = logits.argmax(dim=1).cpu().numpy()
                
                # Métricas
                acc = accuracy_score(y_test, y_pred)
                f1 = f1_score(y_test, y_pred, average='macro')
                metrics = compute_metrics(y_test, y_pred)
                
                fold_accs.append(acc)
                fold_f1s.append(f1)
                
                print(f"Fold {fold} - Acc: {acc:.4f}, F1: {f1:.4f}")
                
                # Limpiar memoria
                del model, optimizer, scheduler, train_loader, X_train, y_train
                torch.cuda.empty_cache()
                
            except Exception as e:
                print(f"ERROR en fold {fold}: {e}")
                import traceback
                traceback.print_exc()
                fold_accs.append(0.0)
                fold_f1s.append(0.0)
                fold_params.append(0)
        
        # Compilar resultados de la variante
        variant_results['mean_accuracy'] = np.mean(fold_accs) if fold_accs else 0.0
        variant_results['std_accuracy'] = np.std(fold_accs) if fold_accs else 0.0
        variant_results['mean_f1'] = np.mean(fold_f1s) if fold_f1s else 0.0
        variant_results['std_f1'] = np.std(fold_f1s) if fold_f1s else 0.0
        variant_results['n_params'] = fold_params[0] if fold_params else 0
        variant_results['folds'] = fold_list
        variant_results['fold_accs'] = fold_accs
        variant_results['fold_f1s'] = fold_f1s
        
        results.append(variant_results)
        
        print(f"\n{variant_info['name']} - Resumen:")
        print(f"  Accuracy: {variant_results['mean_accuracy']:.4f} ± {variant_results['std_accuracy']:.4f}")
        print(f"  F1-Score: {variant_results['mean_f1']:.4f} ± {variant_results['std_f1']:.4f}")
        print(f"  Parámetros: {variant_results['n_params']:,}")
    
    # Crear DataFrame con resultados
    import pandas as pd
    df_results = pd.DataFrame(results)
    
    # Guardar resultados
    results_csv = base_dir / 'ablation_results.csv'
    df_results.to_csv(results_csv, index=False)
    print(f"\n{'='*80}")
    print(f"Resultados guardados en: {results_csv}")
    print(f"{'='*80}")
    
    return df_results


def plot_ablation_results(df_results, output_dir=None):
    """
    Genera visualizaciones de los resultados del estudio de ablación.
    
    Args:
        df_results: DataFrame con resultados
        output_dir: Directorio para guardar gráficos
    """
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'ablation_study'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    
    # Ordenar por accuracy
    df_sorted = df_results.sort_values('mean_accuracy', ascending=False)
    
    # Gráfico de barras con accuracy y F1
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
    
    # Accuracy
    ax1.barh(df_sorted['name'], df_sorted['mean_accuracy'], 
             xerr=df_sorted['std_accuracy'], capsize=5, color='steelblue')
    ax1.set_xlabel('Accuracy')
    ax1.set_title('Comparación de Accuracy por Variante')
    ax1.set_xlim([0, 1.0])
    ax1.axvline(x=df_sorted.iloc[0]['mean_accuracy'], color='red', 
                linestyle='--', alpha=0.5, label='Mejor')
    ax1.legend()
    ax1.grid(axis='x', alpha=0.3)
    
    # F1-Score
    ax2.barh(df_sorted['name'], df_sorted['mean_f1'], 
             xerr=df_sorted['std_f1'], capsize=5, color='coral')
    ax2.set_xlabel('F1-Score (Macro)')
    ax2.set_title('Comparación de F1-Score por Variante')
    ax2.set_xlim([0, 1.0])
    ax2.axvline(x=df_sorted.iloc[0]['mean_f1'], color='red', 
                linestyle='--', alpha=0.5, label='Mejor')
    ax2.legend()
    ax2.grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    fig_path = output_dir / 'ablation_comparison.png'
    plt.savefig(fig_path, dpi=150, bbox_inches='tight')
    print(f"Gráfico guardado en: {fig_path}")
    plt.close()
    
    # Tabla de comparación
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.axis('tight')
    ax.axis('off')
    
    table_data = []
    for _, row in df_sorted.iterrows():
        table_data.append([
            row['name'],
            f"{row['mean_accuracy']:.4f} ± {row['std_accuracy']:.4f}",
            f"{row['mean_f1']:.4f} ± {row['std_f1']:.4f}",
            f"{row['n_params']:,}"
        ])
    
    table = ax.table(cellText=table_data,
                     colLabels=['Variante', 'Accuracy', 'F1-Score', 'Parámetros'],
                     cellLoc='left',
                     loc='center')
    table.auto_set_font_size(False)
    table.set_fontsize(9)
    table.scale(1, 2)
    
    # Colorear la mejor fila
    for i in range(4):
        table[(1, i)].set_facecolor('#90EE90')
    
    plt.title('Resumen de Resultados - Estudio de Ablación', 
              fontsize=12, fontweight='bold', pad=20)
    
    table_path = output_dir / 'ablation_table.png'
    plt.savefig(table_path, dpi=150, bbox_inches='tight')
    print(f"Tabla guardada en: {table_path}")
    plt.close()
    
    return df_sorted

print("✓ Funciones de estudio de ablación definidas")

✓ Funciones de estudio de ablación definidas


### Ejecución del Estudio de Ablación

Ejecuta el estudio de ablación y genera visualizaciones comparativas.

In [19]:
# =========================
# EJECUTAR ESTUDIO DE ABLACIÓN
# =========================

if __name__ == "__main__":
    # Configuración del estudio
    # Puedes cambiar fold_list para evaluar en múltiples folds
    # Por defecto usa solo fold 1 para rapidez (para pruebas)
    # Para resultados finales, usar fold_list=[1, 2, 3, 4, 5]
    
    print("\n" + "="*80)
    print("ESTUDIO DE ABLACIÓN - CNN+Transformer para EEG Motor Imagery")
    print("="*80 + "\n")
    
    # OPCIÓN 1: Prueba rápida con 1 fold
    # fold_list = [1]
    
    # OPCIÓN 2: Evaluación completa con 5 folds (descomenta para usar)
    fold_list = [1, 2, 3, 4, 5]
    
    # Ejecutar estudio de ablación
    print(f"Ejecutando estudio con {len(fold_list)} fold(s): {fold_list}")
    print("Esto puede tardar varios minutos dependiendo del número de folds...\n")
    
    df_results = run_ablation_study(
        device=DEVICE,
        fold_list=fold_list,
        base_dir=PROJ / 'models' / '04_hybrid' / 'ablation_study'
    )
    
    # Mostrar resultados
    print("\n" + "="*80)
    print("RESULTADOS DEL ESTUDIO DE ABLACIÓN")
    print("="*80)
    print(df_results[['name', 'mean_accuracy', 'std_accuracy', 'mean_f1', 'std_f1', 'n_params']].to_string(index=False))
    
    # Generar gráficos
    print("\nGenerando visualizaciones...")
    df_sorted = plot_ablation_results(
        df_results,
        output_dir=PROJ / 'models' / '04_hybrid' / 'ablation_study'
    )
    
    # Análisis de contribución relativa (vs baseline)
    print("\n" + "="*80)
    print("ANÁLISIS DE CONTRIBUCIÓN RELATIVA")
    print("="*80)
    
    baseline_acc = df_results[df_results['variant'] == 'baseline']['mean_accuracy'].values[0]
    
    print(f"\nBaseline Accuracy: {baseline_acc:.4f}\n")
    print("Variante                      | Acc ± Std          | Δ vs Baseline | % Change")
    print("-" * 80)
    
    for _, row in df_sorted.iterrows():
        delta = row['mean_accuracy'] - baseline_acc
        pct_change = (delta / baseline_acc) * 100 if baseline_acc > 0 else 0
        sign = "+" if delta >= 0 else ""
        
        print(f"{row['name']:30s} | {row['mean_accuracy']:.4f} ± {row['std_accuracy']:.4f} | "
              f"{sign}{delta:+.4f}    | {sign}{pct_change:+.2f}%")
    
    print("\n" + "="*80)
    print("INTERPRETACIÓN:")
    print("="*80)
    print("• Δ positivo: El componente REDUCE el rendimiento (su eliminación mejora)")
    print("• Δ negativo: El componente MEJORA el rendimiento (su eliminación empeora)")
    print("• Δ cercano a 0: El componente tiene impacto mínimo")
    print("\n")
    
    # Identificar componentes más importantes
    contributions = []
    for _, row in df_results.iterrows():
        if row['variant'] != 'baseline':
            delta = baseline_acc - row['mean_accuracy']  # Invertido: pérdida de rendimiento
            contributions.append({
                'component': row['name'],
                'contribution': delta,
                'abs_contribution': abs(delta)
            })
    
    contributions_sorted = sorted(contributions, key=lambda x: x['abs_contribution'], reverse=True)
    
    print("Ranking de componentes por impacto (mayor a menor):")
    print("-" * 80)
    for i, comp in enumerate(contributions_sorted, 1):
        impact = "BENEFICIOSO" if comp['contribution'] > 0 else "PERJUDICIAL" if comp['contribution'] < 0 else "NEUTRAL"
        print(f"{i}. {comp['component']:30s} | Contribución: {comp['contribution']:+.4f} | {impact}")
    
    print("\n" + "="*80)
    print("ESTUDIO COMPLETADO")
    print("="*80)
    print(f"Resultados guardados en: {PROJ / 'models' / '04_hybrid' / 'ablation_study'}")
    print("="*80 + "\n")


ESTUDIO DE ABLACIÓN - CNN+Transformer para EEG Motor Imagery

Ejecutando estudio con 5 fold(s): [1, 2, 3, 4, 5]
Esto puede tardar varios minutos dependiendo del número de folds...

INICIANDO ESTUDIO DE ABLACIÓN
Variantes a evaluar: 6
Folds: [1, 2, 3, 4, 5]
Directorio de salida: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ablation_study

VARIANTE: Baseline (Completo)
Descripción: Modelo completo con todos los componentes

--- Fold 1 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.36it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.42it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0648, Val Acc: 0.7778
Early stopping en epoch 13
Fold 1 - Acc: 0.7971, F1: 0.7967

--- Fold 2 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.43it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.41it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0672, Val Acc: 0.8141
Epoch 20/60 - Loss: 0.0544, Val Acc: 0.8231
Early stopping en epoch 21
Fold 2 - Acc: 0.8469, F1: 0.8466

--- Fold 3 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.30it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.35it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0702, Val Acc: 0.7868
Epoch 20/60 - Loss: 0.0607, Val Acc: 0.7846
Early stopping en epoch 25
Fold 3 - Acc: 0.8107, F1: 0.8106

--- Fold 4 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.32it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.27it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0681, Val Acc: 0.8298
Early stopping en epoch 15
Fold 4 - Acc: 0.8476, F1: 0.8476

--- Fold 5 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.30it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.50it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0691, Val Acc: 0.8202
Epoch 20/60 - Loss: 0.0598, Val Acc: 0.8333
Early stopping en epoch 23
Fold 5 - Acc: 0.8583, F1: 0.8582

Baseline (Completo) - Resumen:
  Accuracy: 0.8321 ± 0.0238
  F1-Score: 0.8319 ± 0.0239
  Parámetros: 232,290

VARIANTE: Sin Depthwise Blocks
Descripción: Solo stem convolucional, sin bloques depthwise

--- Fold 1 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 16.48it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.49it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 205,890


Traceback (most recent call last):
  File "<ipython-input-16-8a02d1bef429>", line 232, in run_ablation_study
    logits = model(X_test_t)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-15-6102edd9c698>", line 61, in forward
    z, attn_list = self.encoder(z)
                   ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_

ERROR en fold 1: CUDA out of memory. Tried to allocate 3.05 GiB. GPU 0 has a total capacity of 139.81 GiB of which 463.25 MiB is free. Process 2382944 has 522.00 MiB memory in use. Process 735135 has 97.92 GiB memory in use. Process 894919 has 37.12 GiB memory in use. Including non-PyTorch memory, this process has 3.78 GiB memory in use. Of the allocated memory 1.01 GiB is allocated by PyTorch, and 2.09 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

--- Fold 2 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.31it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 16.88it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 205,890


Traceback (most recent call last):
  File "<ipython-input-16-8a02d1bef429>", line 232, in run_ablation_study
    logits = model(X_test_t)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-15-6102edd9c698>", line 61, in forward
    z, attn_list = self.encoder(z)
                   ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_

ERROR en fold 2: CUDA out of memory. Tried to allocate 3.05 GiB. GPU 0 has a total capacity of 139.81 GiB of which 465.25 MiB is free. Process 2382944 has 522.00 MiB memory in use. Process 735135 has 97.92 GiB memory in use. Process 894919 has 37.12 GiB memory in use. Including non-PyTorch memory, this process has 3.78 GiB memory in use. Of the allocated memory 1.01 GiB is allocated by PyTorch, and 2.09 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

--- Fold 3 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.24it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.20it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 205,890


Traceback (most recent call last):
  File "<ipython-input-16-8a02d1bef429>", line 232, in run_ablation_study
    logits = model(X_test_t)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-15-6102edd9c698>", line 61, in forward
    z, attn_list = self.encoder(z)
                   ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_

ERROR en fold 3: CUDA out of memory. Tried to allocate 3.05 GiB. GPU 0 has a total capacity of 139.81 GiB of which 467.25 MiB is free. Process 2382944 has 522.00 MiB memory in use. Process 735135 has 97.92 GiB memory in use. Process 894919 has 37.12 GiB memory in use. Including non-PyTorch memory, this process has 3.78 GiB memory in use. Of the allocated memory 1.01 GiB is allocated by PyTorch, and 2.09 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

--- Fold 4 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.36it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.41it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 205,890


Traceback (most recent call last):
  File "<ipython-input-16-8a02d1bef429>", line 232, in run_ablation_study
    logits = model(X_test_t)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-15-6102edd9c698>", line 61, in forward
    z, attn_list = self.encoder(z)
                   ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_

ERROR en fold 4: CUDA out of memory. Tried to allocate 2.91 GiB. GPU 0 has a total capacity of 139.81 GiB of which 467.25 MiB is free. Process 2382944 has 522.00 MiB memory in use. Process 735135 has 97.92 GiB memory in use. Process 894919 has 37.12 GiB memory in use. Including non-PyTorch memory, this process has 3.78 GiB memory in use. Of the allocated memory 983.37 MiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

--- Fold 5 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.28it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.50it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 205,890


Traceback (most recent call last):
  File "<ipython-input-16-8a02d1bef429>", line 232, in run_ablation_study
    logits = model(X_test_t)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-15-6102edd9c698>", line 61, in forward
    z, attn_list = self.encoder(z)
                   ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_

ERROR en fold 5: CUDA out of memory. Tried to allocate 2.91 GiB. GPU 0 has a total capacity of 139.81 GiB of which 465.25 MiB is free. Process 2382944 has 522.00 MiB memory in use. Process 735135 has 97.92 GiB memory in use. Process 894919 has 37.12 GiB memory in use. Including non-PyTorch memory, this process has 3.78 GiB memory in use. Of the allocated memory 983.37 MiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Sin Depthwise Blocks - Resumen:
  Accuracy: 0.0000 ± 0.0000
  F1-Score: 0.0000 ± 0.0000
  Parámetros: 205,890

VARIANTE: Sin Codificación Posicional
Descripción: Sin positional encoding en Transformer

--- Fold 1 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.30it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.42it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0913, Val Acc: 0.6825
Early stopping en epoch 19
Fold 1 - Acc: 0.7177, F1: 0.7176

--- Fold 2 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.10it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.33it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0972, Val Acc: 0.6893
Epoch 20/60 - Loss: 0.0839, Val Acc: 0.7392
Epoch 30/60 - Loss: 0.0833, Val Acc: 0.7630
Early stopping en epoch 35
Fold 2 - Acc: 0.7642, F1: 0.7618

--- Fold 3 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.21it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.41it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0961, Val Acc: 0.7404
Epoch 20/60 - Loss: 0.0845, Val Acc: 0.7540
Early stopping en epoch 27
Fold 3 - Acc: 0.7630, F1: 0.7629

--- Fold 4 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.22it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 16.85it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0951, Val Acc: 0.7738
Epoch 20/60 - Loss: 0.0857, Val Acc: 0.7869
Epoch 30/60 - Loss: 0.0783, Val Acc: 0.7524
Early stopping en epoch 31
Fold 4 - Acc: 0.7905, F1: 0.7901

--- Fold 5 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.11it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 16.90it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0994, Val Acc: 0.7345
Epoch 20/60 - Loss: 0.0872, Val Acc: 0.7512
Early stopping en epoch 27
Fold 5 - Acc: 0.7738, F1: 0.7735

Sin Codificación Posicional - Resumen:
  Accuracy: 0.7618 ± 0.0242
  F1-Score: 0.7612 ± 0.0240
  Parámetros: 232,290

VARIANTE: Sin Token CLS
Descripción: Usa average pooling en vez de token CLS

--- Fold 1 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 16.91it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 16.95it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,146
Epoch 10/60 - Loss: 0.0622, Val Acc: 0.7914
Epoch 20/60 - Loss: 0.0532, Val Acc: 0.7959
Early stopping en epoch 27
Fold 1 - Acc: 0.8016, F1: 0.8013

--- Fold 2 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.07it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.54it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,146
Epoch 10/60 - Loss: 0.0629, Val Acc: 0.8265
Early stopping en epoch 19
Fold 2 - Acc: 0.8311, F1: 0.8309

--- Fold 3 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.16it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.02it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,146
Epoch 10/60 - Loss: 0.0656, Val Acc: 0.7993
Epoch 20/60 - Loss: 0.0517, Val Acc: 0.8016
Early stopping en epoch 24
Fold 3 - Acc: 0.8095, F1: 0.8095

--- Fold 4 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.11it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.29it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,146
Epoch 10/60 - Loss: 0.0667, Val Acc: 0.8048
Early stopping en epoch 13
Fold 4 - Acc: 0.8298, F1: 0.8296

--- Fold 5 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.48it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.50it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,146
Epoch 10/60 - Loss: 0.0634, Val Acc: 0.8024
Early stopping en epoch 19
Fold 5 - Acc: 0.8571, F1: 0.8571

Sin Token CLS - Resumen:
  Accuracy: 0.8258 ± 0.0194
  F1-Score: 0.8257 ± 0.0194
  Parámetros: 232,146

VARIANTE: Con BatchNorm
Descripción: BatchNorm en vez de GroupNorm

--- Fold 1 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.50it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.54it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0676, Val Acc: 0.7834
Early stopping en epoch 17
Fold 1 - Acc: 0.7857, F1: 0.7857

--- Fold 2 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.38it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.51it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0733, Val Acc: 0.8209
Epoch 20/60 - Loss: 0.0642, Val Acc: 0.8288
Early stopping en epoch 27
Fold 2 - Acc: 0.8537, F1: 0.8537

--- Fold 3 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.18it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.42it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0741, Val Acc: 0.7902
Early stopping en epoch 19
Fold 3 - Acc: 0.7993, F1: 0.7993

--- Fold 4 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.27it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.35it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0743, Val Acc: 0.8155
Epoch 20/60 - Loss: 0.0642, Val Acc: 0.8214
Early stopping en epoch 20
Fold 4 - Acc: 0.8298, F1: 0.8296

--- Fold 5 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.07it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.31it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0744, Val Acc: 0.8083
Early stopping en epoch 15
Fold 5 - Acc: 0.8417, F1: 0.8416

Con BatchNorm - Resumen:
  Accuracy: 0.8220 ± 0.0256
  F1-Score: 0.8220 ± 0.0256
  Parámetros: 232,290

VARIANTE: Con ReLU
Descripción: ReLU en vez de ELU/GELU

--- Fold 1 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.09it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.37it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0651, Val Acc: 0.8039
Epoch 20/60 - Loss: 0.0539, Val Acc: 0.7959
Early stopping en epoch 23
Fold 1 - Acc: 0.8095, F1: 0.8095

--- Fold 2 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.22it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.54it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0681, Val Acc: 0.8209
Early stopping en epoch 19
Fold 2 - Acc: 0.8390, F1: 0.8389

--- Fold 3 ---
Train: 82 sujetos, Test: 21 sujetos


Cargando train: 100%|██████████| 82/82 [00:04<00:00, 17.49it/s]
Cargando test: 100%|██████████| 21/21 [00:01<00:00, 17.33it/s]


Train: (3444, 8, 961), Test: (882, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0672, Val Acc: 0.7880
Epoch 20/60 - Loss: 0.0608, Val Acc: 0.7948
Early stopping en epoch 24
Fold 3 - Acc: 0.7971, F1: 0.7970

--- Fold 4 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 17.43it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.32it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0703, Val Acc: 0.8238
Early stopping en epoch 15
Fold 4 - Acc: 0.8417, F1: 0.8416

--- Fold 5 ---
Train: 83 sujetos, Test: 20 sujetos


Cargando train: 100%|██████████| 83/83 [00:04<00:00, 16.68it/s]
Cargando test: 100%|██████████| 20/20 [00:01<00:00, 17.22it/s]


Train: (3486, 8, 961), Test: (840, 8, 961)
Parámetros: 232,290
Epoch 10/60 - Loss: 0.0710, Val Acc: 0.8262
Early stopping en epoch 13
Fold 5 - Acc: 0.8381, F1: 0.8377

Con ReLU - Resumen:
  Accuracy: 0.8251 ± 0.0183
  F1-Score: 0.8250 ± 0.0182
  Parámetros: 232,290

Resultados guardados en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/ablation_study/ablation_results.csv

RESULTADOS DEL ESTUDIO DE ABLACIÓN
                       name  mean_accuracy  std_accuracy  mean_f1   std_f1  n_params
        Baseline (Completo)       0.832120      0.023822 0.831932 0.023861    232290
       Sin Depthwise Blocks       0.000000      0.000000 0.000000 0.000000    205890
Sin Codificación Posicional       0.761837      0.024165 0.761189 0.024025    232290
              Sin Token CLS       0.825816      0.019375 0.825687 0.019420    232146
              Con BatchNorm       0.822041      0.025634 0.821963 0.025619    232290
                   Con ReLU       0.825068      0.018253 0.824956 0.018186   

---
## Instrucciones de Uso

### Configuración rápida:
1. **Prueba rápida** (1 fold): Deja `fold_list = [1]` en la celda anterior
2. **Evaluación completa** (5 folds): Cambia a `fold_list = [1, 2, 3, 4, 5]`

### Tiempo estimado:
- **1 fold**: ~10-15 minutos por variante (total ~60-90 minutos para 6 variantes)
- **5 folds**: ~50-75 minutos por variante (total ~5-7.5 horas para 6 variantes)

### Resultados generados:
- **CSV**: `ablation_results.csv` - Tabla con todas las métricas
- **Gráficos**: 
  - `ablation_comparison.png` - Comparación visual de accuracy y F1
  - `ablation_table.png` - Tabla formateada con resultados
- **Modelos**: Guardados en `ablation_study/<variante>/best_fold{X}.pt`

### Interpretación de resultados:
- **Baseline** es tu modelo completo (referencia)
- Si al **eliminar un componente** el accuracy **baja** → ese componente es **beneficioso**
- Si al **eliminar un componente** el accuracy **sube** → ese componente puede estar **perjudicando**
- Diferencias < 1-2% pueden ser ruido estadístico

### Para tu tesis:
Puedes usar los gráficos y tablas generadas directamente en tu documento. La tabla de "Contribución Relativa" te ayudará a argumentar qué componentes son esenciales para tu modelo.

---
## ANÁLISIS ESTADÍSTICO DE VARIACIONES ARQUITECTÓNICAS

Esta sección implementa un análisis estadístico riguroso de diferentes configuraciones arquitectónicas del modelo híbrido CNN-Transformer, evaluando variaciones en el número de bloques depthwise y cabezas de atención.

### Objetivo:
Evaluar sistemáticamente diferentes arquitecturas usando 5-fold CV y determinar mediante tests estadísticos (Wilcoxon/Friedman) si las diferencias son significativas.

In [23]:
# =========================
# SWEEP ARQUITECTÓNICO CON 5 FOLDS
# =========================

def run_arch_sweep_5folds(device, base_blocks=2, deltas=(-2, 0, +2), head_set=(2, 4, 6), output_dir=None):
    """
    Ejecuta barrido arquitectónico completo con 5-fold Cross-Validation.
    
    Evalúa todas las combinaciones de:
    - Número de bloques depthwise: base_blocks + deltas
    - Número de attention heads: head_set
    
    Args:
        device: torch.device - Dispositivo para entrenamiento
        base_blocks: int - Número base de bloques depthwise (default: 2)
        deltas: tuple - Variaciones del número de bloques (default: (-2, 0, +2))
        head_set: tuple - Conjunto de attention heads a probar (default: (2, 4, 6))
        output_dir: Path - Directorio para guardar resultados
    
    Returns:
        pd.DataFrame - Resultados con columnas:
            - arch_id: ID de la arquitectura
            - n_dw_blocks: Número de bloques depthwise
            - n_heads: Número de attention heads
            - fold_1 a fold_5: Resultados de F1-macro por fold
            - mean_f1: Media de F1
            - std_f1: Desviación estándar de F1
            - mean_acc: Media de Accuracy
            - std_acc: Desviación estándar de Accuracy
            - n_params: Número de parámetros
    """
    import pandas as pd
    from itertools import product
    
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'arch_sweep_5fold'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    
    # Generar todas las combinaciones arquitectónicas
    arch_configs = []
    for dlt in deltas:
        n_blocks = int(np.clip(base_blocks + dlt, 0, 5))
        for n_heads in head_set:
            # Validar que d_model sea divisible por n_heads
            if D_MODEL % n_heads != 0:
                print(f"[SKIP] n_heads={n_heads} no es divisible con D_MODEL={D_MODEL}")
                continue
            arch_configs.append({
                'n_dw_blocks': n_blocks,
                'n_heads': n_heads,
                'arch_id': f'nb{n_blocks}_h{n_heads}'
            })
    
    total_archs = len(arch_configs)
    total_runs = total_archs * 5
    
    print("="*80)
    print("SWEEP ARQUITECTÓNICO CON 5-FOLD CROSS-VALIDATION")
    print("="*80)
    print(f"\nParámetros del sweep:")
    print(f"  • Base blocks: {base_blocks}")
    print(f"  • Deltas: {deltas}")
    print(f"  • Head set: {head_set}")
    print(f"  • D_MODEL: {D_MODEL}")
    
    print(f"\nArquitecturas a evaluar: {total_archs}")
    print(f"Total de entrenamientos: {total_runs} ({total_archs} archs × 5 folds)")
    print(f"Directorio de salida: {output_dir}")
    print("="*80 + "\n")
    
    # Almacenar resultados
    results = []
    
    # ========================================
    # LOOP PRINCIPAL: ARQUITECTURAS × FOLDS
    # ========================================
    for arch_idx, config in enumerate(arch_configs, 1):
        arch_id = config['arch_id']
        n_blocks = config['n_dw_blocks']
        n_heads = config['n_heads']
        
        print(f"\n{'='*80}")
        print(f"ARQUITECTURA {arch_idx}/{total_archs}: {arch_id}")
        print(f"{'='*80}")
        print(f"  • Depthwise Blocks: {n_blocks}")
        print(f"  • Attention Heads: {n_heads}")
        print("-"*80)
        
        arch_dir = output_dir / arch_id
        ensure_dir(arch_dir)
        
        # Resultados de los 5 folds para esta arquitectura
        fold_f1s = []
        fold_accs = []
        n_params = None
        
        # ========================================
        # 5-FOLD LOOP
        # ========================================
        for fold_idx in range(1, 6):
            print(f"\n--- Fold {fold_idx}/5 ---")
            seed_everything(RANDOM_STATE + fold_idx)
            
            try:
                # Factory del modelo con esta arquitectura
                def make_model():
                    return EEGCNNTransformer(
                        n_ch=8,
                        n_cls=2,
                        d_model=D_MODEL,
                        n_heads=n_heads,
                        n_layers=N_LAYERS,
                        p_drop=P_DROP,
                        p_drop_encoder=P_DROP_ENCODER,
                        n_dw_blocks=n_blocks,
                        capture_attn=False
                    ).to(device)
                
                # Entrenar este fold con train_one_fold
                acc, f1_macro, ft_acc, consumption, metrics = train_one_fold(
                    fold=fold_idx,
                    device=device,
                    model_factory=make_model,
                    run_tag=arch_id,
                    out_dir=arch_dir,
                    save_artifacts=True,
                    do_ft=False
                )
                
                fold_accs.append(acc)
                fold_f1s.append(f1_macro)
                
                if n_params is None:
                    n_params = consumption.get('params_total', 0)
                
                print(f"✓ Fold {fold_idx} completado: ACC={acc:.4f}, F1={f1_macro:.4f}")
                
            except Exception as e:
                print(f"✗ ERROR en Fold {fold_idx}: {e}")
                import traceback
                traceback.print_exc()
                fold_accs.append(0.0)
                fold_f1s.append(0.0)
        
        # ========================================
        # COMPILAR RESULTADOS DE ESTA ARQUITECTURA
        # ========================================
        result_row = {
            'arch_id': arch_id,
            'n_dw_blocks': n_blocks,
            'n_heads': n_heads,
            'fold_1_f1': fold_f1s[0] if len(fold_f1s) > 0 else 0.0,
            'fold_2_f1': fold_f1s[1] if len(fold_f1s) > 1 else 0.0,
            'fold_3_f1': fold_f1s[2] if len(fold_f1s) > 2 else 0.0,
            'fold_4_f1': fold_f1s[3] if len(fold_f1s) > 3 else 0.0,
            'fold_5_f1': fold_f1s[4] if len(fold_f1s) > 4 else 0.0,
            'fold_1_acc': fold_accs[0] if len(fold_accs) > 0 else 0.0,
            'fold_2_acc': fold_accs[1] if len(fold_accs) > 1 else 0.0,
            'fold_3_acc': fold_accs[2] if len(fold_accs) > 2 else 0.0,
            'fold_4_acc': fold_accs[3] if len(fold_accs) > 3 else 0.0,
            'fold_5_acc': fold_accs[4] if len(fold_accs) > 4 else 0.0,
            'mean_f1': np.mean(fold_f1s) if fold_f1s else 0.0,
            'std_f1': np.std(fold_f1s) if fold_f1s else 0.0,
            'mean_acc': np.mean(fold_accs) if fold_accs else 0.0,
            'std_acc': np.std(fold_accs) if fold_accs else 0.0,
            'n_params': n_params if n_params else 0
        }
        
        results.append(result_row)
        
        print(f"\n{'─'*80}")
        print(f"RESUMEN {arch_id}:")
        print(f"  F1 por fold: {[f'{f:.4f}' for f in fold_f1s]}")
        print(f"  F1 Media ± Std: {result_row['mean_f1']:.4f} ± {result_row['std_f1']:.4f}")
        print(f"  ACC Media ± Std: {result_row['mean_acc']:.4f} ± {result_row['std_acc']:.4f}")
        print(f"  Parámetros: {result_row['n_params']:,}")
        print(f"{'─'*80}")
    
    # ========================================
    # COMPILAR RESULTADOS FINALES
    # ========================================
    df_results = pd.DataFrame(results)
    
    # Ordenar por mean_f1 descendente
    df_results = df_results.sort_values('mean_f1', ascending=False).reset_index(drop=True)
    
    # Guardar resultados
    csv_path = output_dir / 'arch_sweep_5fold_results.csv'
    df_results.to_csv(csv_path, index=False)
    
    print("\n" + "="*80)
    print("SWEEP ARQUITECTÓNICO COMPLETADO")
    print("="*80)
    print(f"\nResultados guardados en: {csv_path}")
    print(f"\nMejor arquitectura:")
    best = df_results.iloc[0]
    print(f"  • Arch ID: {best['arch_id']}")
    print(f"  • Blocks: {int(best['n_dw_blocks'])}, Heads: {int(best['n_heads'])}")
    print(f"  • F1 Score: {best['mean_f1']:.4f} ± {best['std_f1']:.4f}")
    print(f"  • Accuracy: {best['mean_acc']:.4f} ± {best['std_acc']:.4f}")
    print(f"  • Parámetros: {int(best['n_params']):,}")
    print("="*80 + "\n")
    
    return df_results


def analyze_arch_sweep_statistical(df_results, baseline_arch='nb2_h6', output_dir=None, alpha=0.05):
    """
    Realiza análisis estadístico de los resultados del sweep arquitectónico.
    
    Incluye:
    - Test de Wilcoxon pareado (cada arquitectura vs baseline)
    - Test de Friedman (comparación múltiple)
    - Post-hoc tests si Friedman es significativo
    
    Args:
        df_results: DataFrame con resultados del sweep
        baseline_arch: str - ID de la arquitectura baseline para comparación
        output_dir: Path - Directorio para guardar resultados
        alpha: float - Nivel de significancia (default: 0.05)
    
    Returns:
        dict - Resultados del análisis estadístico
    """
    from scipy.stats import wilcoxon, friedmanchisquare
    import pandas as pd
    
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'arch_sweep_5fold'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    
    print("\n" + "="*80)
    print("ANÁLISIS ESTADÍSTICO DEL SWEEP ARQUITECTÓNICO")
    print("="*80 + "\n")
    
    # Verificar que existe el baseline
    if baseline_arch not in df_results['arch_id'].values:
        print(f"[WARN] Baseline '{baseline_arch}' no encontrado. Usando primera arquitectura.")
        baseline_arch = df_results.iloc[0]['arch_id']
    
    baseline_row = df_results[df_results['arch_id'] == baseline_arch].iloc[0]
    baseline_f1s = [baseline_row[f'fold_{i}_f1'] for i in range(1, 6)]
    
    print(f"Arquitectura Baseline: {baseline_arch}")
    print(f"  F1 Scores: {[f'{f:.4f}' for f in baseline_f1s]}")
    print(f"  Media: {np.mean(baseline_f1s):.4f} ± {np.std(baseline_f1s):.4f}\n")
    
    # ========================================
    # TEST DE WILCOXON (vs Baseline)
    # ========================================
    print("-"*80)
    print("TEST DE WILCOXON PAREADO (cada arquitectura vs baseline)")
    print("-"*80 + "\n")
    
    wilcoxon_results = []
    
    for _, row in df_results.iterrows():
        arch_id = row['arch_id']
        
        if arch_id == baseline_arch:
            continue  # No comparar baseline consigo mismo
        
        arch_f1s = [row[f'fold_{i}_f1'] for i in range(1, 6)]
        
        try:
            # Test de Wilcoxon pareado
            stat, p_value = wilcoxon(arch_f1s, baseline_f1s, alternative='two-sided')
            
            # Calcular diferencia media
            mean_diff = np.mean(arch_f1s) - np.mean(baseline_f1s)
            
            # Determinar significancia
            is_significant = p_value < alpha
            
            wilcoxon_results.append({
                'arch_id': arch_id,
                'n_dw_blocks': int(row['n_dw_blocks']),
                'n_heads': int(row['n_heads']),
                'mean_f1': row['mean_f1'],
                'mean_diff_vs_baseline': mean_diff,
                'wilcoxon_stat': stat,
                'p_value': p_value,
                'significant': is_significant,
                'interpretation': 'SIGNIFICATIVO' if is_significant else 'No significativo'
            })
            
            print(f"{arch_id}:")
            print(f"  F1: {row['mean_f1']:.4f} ± {row['std_f1']:.4f}")
            print(f"  Δ vs baseline: {mean_diff:+.4f}")
            print(f"  Wilcoxon stat: {stat:.4f}, p-value: {p_value:.4f} {'***' if is_significant else ''}")
            print(f"  {wilcoxon_results[-1]['interpretation']}\n")
            
        except Exception as e:
            print(f"{arch_id}: ERROR - {e}\n")
            wilcoxon_results.append({
                'arch_id': arch_id,
                'n_dw_blocks': int(row['n_dw_blocks']),
                'n_heads': int(row['n_heads']),
                'mean_f1': row['mean_f1'],
                'mean_diff_vs_baseline': 0.0,
                'wilcoxon_stat': np.nan,
                'p_value': np.nan,
                'significant': False,
                'interpretation': 'ERROR'
            })
    
    df_wilcoxon = pd.DataFrame(wilcoxon_results)
    
    # Guardar resultados de Wilcoxon
    wilcoxon_path = output_dir / 'wilcoxon_results.csv'
    df_wilcoxon.to_csv(wilcoxon_path, index=False)
    print(f"Resultados de Wilcoxon guardados en: {wilcoxon_path}\n")
    
    # ========================================
    # TEST DE FRIEDMAN (Comparación múltiple)
    # ========================================
    print("-"*80)
    print("TEST DE FRIEDMAN (comparación múltiple)")
    print("-"*80 + "\n")
    
    # Preparar datos para Friedman: matriz (n_folds × n_arquitecturas)
    all_f1s = []
    arch_labels = []
    
    for _, row in df_results.iterrows():
        arch_f1s = [row[f'fold_{i}_f1'] for i in range(1, 6)]
        all_f1s.append(arch_f1s)
        arch_labels.append(row['arch_id'])
    
    try:
        # Test de Friedman
        friedman_stat, friedman_p = friedmanchisquare(*all_f1s)
        
        print(f"Friedman χ² statistic: {friedman_stat:.4f}")
        print(f"p-value: {friedman_p:.6f}")
        
        if friedman_p < alpha:
            print(f"\n✓ SIGNIFICATIVO (p < {alpha})")
            print("Conclusión: Existen diferencias significativas entre las arquitecturas.")
        else:
            print(f"\n✗ NO SIGNIFICATIVO (p ≥ {alpha})")
            print("Conclusión: No hay evidencia de diferencias significativas entre arquitecturas.")
    
    except Exception as e:
        print(f"ERROR en test de Friedman: {e}")
        friedman_stat = np.nan
        friedman_p = np.nan
    
    # ========================================
    # COMPILAR RESULTADOS
    # ========================================
    statistical_summary = {
        'baseline_arch': baseline_arch,
        'baseline_mean_f1': np.mean(baseline_f1s),
        'baseline_std_f1': np.std(baseline_f1s),
        'n_architectures': len(df_results),
        'alpha': alpha,
        'wilcoxon_results': df_wilcoxon,
        'friedman_stat': friedman_stat,
        'friedman_p_value': friedman_p,
        'friedman_significant': friedman_p < alpha if not np.isnan(friedman_p) else False
    }
    
    # Guardar resumen
    summary = {
        'baseline_arch': baseline_arch,
        'baseline_mean_f1': np.mean(baseline_f1s),
        'n_architectures_tested': len(df_results),
        'n_significant_vs_baseline': int(df_wilcoxon['significant'].sum()),
        'friedman_statistic': friedman_stat,
        'friedman_p_value': friedman_p,
        'friedman_significant': friedman_p < alpha if not np.isnan(friedman_p) else False
    }
    
    summary_path = output_dir / 'statistical_summary.json'
    with open(summary_path, 'w') as f:
        json.dump(summary, f, indent=2)
    
    print(f"\n{'='*80}")
    print("RESUMEN ESTADÍSTICO:")
    print(f"{'='*80}")
    print(f"  • Total arquitecturas: {len(df_results)}")
    print(f"  • Significativas vs baseline (Wilcoxon): {int(df_wilcoxon['significant'].sum())}")
    print(f"  • Test de Friedman: {'SIGNIFICATIVO' if summary['friedman_significant'] else 'No significativo'}")
    print(f"\nResumen guardado en: {summary_path}")
    print(f"{'='*80}\n")
    
    return statistical_summary


def run_winner_architecture(device, arch_id, output_dir=None):
    """
    Ejecuta la arquitectura ganadora en los 5 folds completos con todos los artefactos.
    
    Args:
        device: torch.device - Dispositivo para entrenamiento
        arch_id: str - ID de la arquitectura ganadora (ej: 'nb2_h6')
        output_dir: Path - Directorio para guardar resultados
    
    Returns:
        dict - Resultados finales con métricas
    """
    # Extraer n_blocks y n_heads del arch_id
    import re
    match = re.match(r'nb(\d+)_h(\d+)', arch_id)
    if not match:
        raise ValueError(f"arch_id '{arch_id}' no tiene formato válido (debe ser 'nbX_hY')")
    
    n_blocks = int(match.group(1))
    n_heads = int(match.group(2))
    
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'winner_architecture'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    
    print("\n" + "="*80)
    print(f"EJECUTANDO ARQUITECTURA GANADORA: {arch_id}")
    print("="*80)
    print(f"  • Depthwise Blocks: {n_blocks}")
    print(f"  • Attention Heads: {n_heads}")
    print(f"  • D_MODEL: {D_MODEL}")
    print(f"  • Directorio: {output_dir}")
    print("="*80 + "\n")
    
    # Usar la función existente run_full_5fold_for_config
    # Pero con un output_dir personalizado
    
    # Llamar directamente a la función con el output_dir correcto
    tag = arch_id
    base_dir = output_dir
    
    acc_folds, f1_folds = [], []
    results = []
    
    def factory():
        return EEGCNNTransformer(
            n_ch=8,
            n_cls=2,
            d_model=D_MODEL,
            n_heads=n_heads,
            n_layers=N_LAYERS,
            p_drop=P_DROP,
            p_drop_encoder=P_DROP_ENCODER,
            n_dw_blocks=n_blocks,
            capture_attn=True
        ).to(device)
    
    for fold in range(1, 6):
        print(f"\n{'='*80}")
        print(f"FOLD {fold}/5")
        print(f"{'='*80}")
        
        acc, f1m, ft_acc, cons, met = train_one_fold(
            fold=fold,
            device=device,
            model_factory=factory,
            run_tag=tag,
            out_dir=base_dir,
            save_artifacts=True,
            do_ft=True
        )
        
        acc_folds.append(acc)
        f1_folds.append(f1m)
        
        results.append({
            'fold': fold,
            'accuracy': acc,
            'f1_macro': f1m,
            'ft_accuracy': ft_acc,
            'n_params': cons.get('params_total'),
            'precision_macro': met.get('precision_macro'),
            'recall_macro': met.get('recall_macro'),
            'specificity_macro': met.get('specificity_macro')
        })
        
        print(f"✓ Fold {fold} - ACC: {acc:.4f}, F1: {f1m:.4f}")
    
    # Resumen final
    acc_mean = np.mean(acc_folds)
    f1_mean = np.mean(f1_folds)
    acc_std = np.std(acc_folds)
    f1_std = np.std(f1_folds)
    
    final_results = {
        'arch_id': arch_id,
        'n_dw_blocks': n_blocks,
        'n_heads': n_heads,
        'fold_results': results,
        'mean_accuracy': acc_mean,
        'std_accuracy': acc_std,
        'mean_f1': f1_mean,
        'std_f1': f1_std,
        'fold_accuracies': acc_folds,
        'fold_f1s': f1_folds
    }
    
    # Guardar resultados
    import pandas as pd
    df_results = pd.DataFrame(results)
    results_path = output_dir / f'winner_{arch_id}_results.csv'
    df_results.to_csv(results_path, index=False)
    
    # Guardar JSON con resumen
    summary_path = output_dir / f'winner_{arch_id}_summary.json'
    with open(summary_path, 'w') as f:
        json.dump({
            'arch_id': arch_id,
            'n_dw_blocks': n_blocks,
            'n_heads': n_heads,
            'mean_accuracy': acc_mean,
            'std_accuracy': acc_std,
            'mean_f1': f1_mean,
            'std_f1': f1_std
        }, f, indent=2)
    
    print("\n" + "="*80)
    print("RESULTADOS FINALES - ARQUITECTURA GANADORA")
    print("="*80)
    print(f"\nArquitectura: {arch_id}")
    print(f"  • Blocks: {n_blocks}, Heads: {n_heads}")
    print(f"\nAccuracy por fold: {[f'{a:.4f}' for a in acc_folds]}")
    print(f"  Media ± Std: {acc_mean:.4f} ± {acc_std:.4f}")
    print(f"\nF1-Macro por fold: {[f'{f:.4f}' for f in f1_folds]}")
    print(f"  Media ± Std: {f1_mean:.4f} ± {f1_std:.4f}")
    print(f"\nResultados guardados en:")
    print(f"  • CSV: {results_path}")
    print(f"  • JSON: {summary_path}")
    print("="*80 + "\n")
    
    return final_results

print("✓ Funciones de análisis arquitectónico estadístico definidas")

✓ Funciones de análisis arquitectónico estadístico definidas


### Visualización de Resultados del Sweep Arquitectónico

Funciones para generar gráficos comparativos de las diferentes arquitecturas.

In [24]:
# =========================
# VISUALIZACIÓN DEL SWEEP ARQUITECTÓNICO
# =========================

def plot_arch_sweep_results(df_results, df_wilcoxon=None, output_dir=None):
    """
    Genera visualizaciones comprehensivas de los resultados del sweep arquitectónico.
    
    Args:
        df_results: DataFrame con resultados del sweep
        df_wilcoxon: DataFrame con resultados de tests de Wilcoxon (opcional)
        output_dir: Path - Directorio para guardar gráficos
    """
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'arch_sweep_5fold'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    
    # ========================================
    # GRÁFICO 1: Barras con errorbar + p-values
    # ========================================
    fig, ax = plt.subplots(figsize=(14, 7))
    
    x = np.arange(len(df_results))
    bars = ax.bar(x, df_results['mean_f1'], yerr=df_results['std_f1'],
                  capsize=5, alpha=0.8, edgecolor='navy')
    
    # Colorear barras según significancia (si hay datos de Wilcoxon)
    if df_wilcoxon is not None:
        for i, (idx, row) in enumerate(df_results.iterrows()):
            wilcox_row = df_wilcoxon[df_wilcoxon['arch_id'] == row['arch_id']]
            if len(wilcox_row) > 0 and wilcox_row.iloc[0]['significant']:
                bars[i].set_color('salmon')
                bars[i].set_edgecolor('darkred')
            else:
                bars[i].set_color('steelblue')
    
    ax.set_xlabel('Arquitectura', fontsize=12, fontweight='bold')
    ax.set_ylabel('F1-Score (Macro)', fontsize=12, fontweight='bold')
    ax.set_title('Comparación de Arquitecturas - Sweep con 5-Fold CV', fontsize=14, fontweight='bold')
    ax.set_xticks(x)
    ax.set_xticklabels(df_results['arch_id'], rotation=45, ha='right', fontsize=10)
    ax.axhline(y=df_results['mean_f1'].iloc[0], color='red', linestyle='--', 
               alpha=0.5, linewidth=2, label='Mejor')
    ax.grid(axis='y', alpha=0.3)
    ax.set_ylim([0, 1.0])
    
    # Leyenda
    from matplotlib.patches import Patch
    legend_elements = [
        Patch(facecolor='steelblue', edgecolor='navy', label='No significativo'),
        Patch(facecolor='salmon', edgecolor='darkred', label='Significativo (p<0.05)'),
        ax.lines[0]
    ]
    ax.legend(handles=legend_elements, loc='lower right')
    
    plt.tight_layout()
    fig_path = output_dir / 'arch_sweep_comparison.png'
    plt.savefig(fig_path, dpi=150, bbox_inches='tight')
    print(f"✓ Gráfico de comparación guardado: {fig_path}")
    plt.close()
    
    # ========================================
    # GRÁFICO 2: Heatmap de Blocks vs Heads
    # ========================================
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
    
    # Heatmap de F1-Score
    pivot_f1 = df_results.pivot_table(
        values='mean_f1',
        index='n_dw_blocks',
        columns='n_heads',
        aggfunc='mean'
    )
    
    im1 = ax1.imshow(pivot_f1, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1)
    ax1.set_title('F1-Score por Configuración', fontsize=12, fontweight='bold')
    ax1.set_xlabel('Número de Attention Heads', fontsize=11)
    ax1.set_ylabel('Número de Bloques Depthwise', fontsize=11)
    ax1.set_xticks(range(len(pivot_f1.columns)))
    ax1.set_xticklabels([int(v) for v in pivot_f1.columns])
    ax1.set_yticks(range(len(pivot_f1.index)))
    ax1.set_yticklabels([int(v) for v in pivot_f1.index])
    
    # Anotar valores
    for i in range(len(pivot_f1.index)):
        for j in range(len(pivot_f1.columns)):
            if not np.isnan(pivot_f1.iloc[i, j]):
                ax1.text(j, i, f'{pivot_f1.iloc[i, j]:.3f}',
                        ha='center', va='center', color='black', fontsize=10, fontweight='bold')
    
    plt.colorbar(im1, ax=ax1, label='F1-Score')
    
    # Heatmap de número de parámetros
    pivot_params = df_results.pivot_table(
        values='n_params',
        index='n_dw_blocks',
        columns='n_heads',
        aggfunc='mean'
    )
    
    im2 = ax2.imshow(pivot_params / 1e6, cmap='Blues', aspect='auto')
    ax2.set_title('Número de Parámetros (M)', fontsize=12, fontweight='bold')
    ax2.set_xlabel('Número de Attention Heads', fontsize=11)
    ax2.set_ylabel('Número de Bloques Depthwise', fontsize=11)
    ax2.set_xticks(range(len(pivot_params.columns)))
    ax2.set_xticklabels([int(v) for v in pivot_params.columns])
    ax2.set_yticks(range(len(pivot_params.index)))
    ax2.set_yticklabels([int(v) for v in pivot_params.index])
    
    # Anotar valores
    for i in range(len(pivot_params.index)):
        for j in range(len(pivot_params.columns)):
            if not np.isnan(pivot_params.iloc[i, j]):
                ax2.text(j, i, f'{pivot_params.iloc[i, j]/1e6:.2f}M',
                        ha='center', va='center', color='white' if pivot_params.iloc[i, j]/1e6 > 0.5 else 'black',
                        fontsize=9, fontweight='bold')
    
    plt.colorbar(im2, ax=ax2, label='Parámetros (Millones)')
    
    plt.tight_layout()
    heatmap_path = output_dir / 'arch_sweep_heatmaps.png'
    plt.savefig(heatmap_path, dpi=150, bbox_inches='tight')
    print(f"✓ Heatmaps guardados: {heatmap_path}")
    plt.close()
    
    # ========================================
    # GRÁFICO 3: P-values del test de Wilcoxon
    # ========================================
    if df_wilcoxon is not None and len(df_wilcoxon) > 0:
        fig, ax = plt.subplots(figsize=(14, 6))
        
        # Ordenar por p-value
        df_wilcoxon_sorted = df_wilcoxon.sort_values('p_value')
        
        x = np.arange(len(df_wilcoxon_sorted))
        colors = ['red' if sig else 'gray' for sig in df_wilcoxon_sorted['significant']]
        
        ax.bar(x, -np.log10(df_wilcoxon_sorted['p_value']), color=colors, alpha=0.7, edgecolor='black')
        ax.axhline(y=-np.log10(0.05), color='blue', linestyle='--', linewidth=2, label='α=0.05')
        ax.axhline(y=-np.log10(0.01), color='green', linestyle='--', linewidth=2, label='α=0.01')
        
        ax.set_xlabel('Arquitectura', fontsize=12, fontweight='bold')
        ax.set_ylabel('-log₁₀(p-value)', fontsize=12, fontweight='bold')
        ax.set_title('Significancia Estadística vs Baseline (Test de Wilcoxon)', fontsize=14, fontweight='bold')
        ax.set_xticks(x)
        ax.set_xticklabels(df_wilcoxon_sorted['arch_id'], rotation=45, ha='right', fontsize=10)
        ax.legend()
        ax.grid(axis='y', alpha=0.3)
        
        plt.tight_layout()
        pvalue_path = output_dir / 'arch_sweep_pvalues.png'
        plt.savefig(pvalue_path, dpi=150, bbox_inches='tight')
        print(f"✓ Gráfico de p-values guardado: {pvalue_path}")
        plt.close()
    
    # ========================================
    # TABLA RESUMEN
    # ========================================
    fig, ax = plt.subplots(figsize=(16, max(6, len(df_results) * 0.4)))
    ax.axis('tight')
    ax.axis('off')
    
    table_data = []
    for idx, row in df_results.iterrows():
        sig_marker = ''
        if df_wilcoxon is not None:
            wilcox_row = df_wilcoxon[df_wilcoxon['arch_id'] == row['arch_id']]
            if len(wilcox_row) > 0:
                if wilcox_row.iloc[0]['significant']:
                    sig_marker = ' ***'
                p_val = wilcox_row.iloc[0]['p_value']
                sig_marker += f" (p={p_val:.4f})"
        
        table_data.append([
            row['arch_id'] + sig_marker,
            f"{int(row['n_dw_blocks'])}",
            f"{int(row['n_heads'])}",
            f"{row['mean_f1']:.4f} ± {row['std_f1']:.4f}",
            f"{row['mean_acc']:.4f} ± {row['std_acc']:.4f}",
            f"{int(row['n_params']):,}"
        ])
    
    table = ax.table(cellText=table_data,
                     colLabels=['Arquitectura', 'Blocks', 'Heads', 'F1-Score', 'Accuracy', 'Parámetros'],
                     cellLoc='center',
                     loc='center',
                     colWidths=[0.25, 0.1, 0.1, 0.2, 0.2, 0.15])
    table.auto_set_font_size(False)
    table.set_fontsize(9)
    table.scale(1, 2)
    
    # Colorear la mejor fila
    for i in range(6):
        table[(1, i)].set_facecolor('#90EE90')
    
    plt.title('Tabla Resumen - Sweep Arquitectónico (*** = significativo vs baseline)', 
              fontsize=12, fontweight='bold', pad=20)
    
    table_path = output_dir / 'arch_sweep_table.png'
    plt.savefig(table_path, dpi=150, bbox_inches='tight')
    print(f"✓ Tabla guardada: {table_path}")
    plt.close()
    
    return df_results

print("✓ Funciones de visualización definidas")

✓ Funciones de visualización definidas


### Ejecución del Sweep Arquitectónico

Ejecuta el sweep completo con análisis estadístico y visualizaciones.

In [25]:
# =========================
# EJECUTAR SWEEP ARQUITECTÓNICO COMPLETO
# =========================

if __name__ == "__main__":
    print("\n" + "="*80)
    print("SWEEP ARQUITECTÓNICO CON ANÁLISIS ESTADÍSTICO")
    print("="*80 + "\n")
    
    # ========================================
    # PASO 1: Ejecutar el sweep con 5 folds
    # ========================================
    print("PASO 1: Ejecutando sweep arquitectónico con 5-fold CV")
    print("-"*80 + "\n")
    
    # Configuración del sweep
    base_blocks = 2
    deltas = (-2, 0, +2)  # Probará: 0, 2, 4 bloques
    head_set = (2, 4, 6)   # Probará: 2, 4, 6 attention heads
    
    print("Configuración:")
    print(f"  • Base blocks: {base_blocks}")
    print(f"  • Deltas: {deltas}")
    print(f"  • Head set: {head_set}")
    print(f"  • Total de arquitecturas: {len(deltas) * len(head_set)} (máximo)")
    print(f"\nADVERTENCIA: Este proceso tomará varias horas.")
    print(f"  Tiempo estimado: ~6-10 horas (depende del hardware)\n")
    
    # Ejecutar sweep
    df_arch_results = run_arch_sweep_5folds(
        device=DEVICE,
        base_blocks=base_blocks,
        deltas=deltas,
        head_set=head_set,
        output_dir=PROJ / 'models' / '04_hybrid' / 'arch_sweep_5fold'
    )
    
    # ========================================
    # PASO 2: Análisis estadístico
    # ========================================
    print("\n" + "="*80)
    print("PASO 2: Análisis estadístico de resultados")
    print("-"*80 + "\n")
    
    # Identificar baseline (típicamente nb2_h6 o la configuración central)
    baseline_arch = 'nb2_h6'  # Ajustar según tu modelo ganador
    
    # Si el baseline no existe, usar el mejor
    if baseline_arch not in df_arch_results['arch_id'].values:
        baseline_arch = df_arch_results.iloc[0]['arch_id']
        print(f"[INFO] Baseline ajustado a: {baseline_arch}")
    
    statistical_results = analyze_arch_sweep_statistical(
        df_results=df_arch_results,
        baseline_arch=baseline_arch,
        output_dir=PROJ / 'models' / '04_hybrid' / 'arch_sweep_5fold',
        alpha=0.05
    )
    
    # ========================================
    # PASO 3: Visualizaciones
    # ========================================
    print("\n" + "="*80)
    print("PASO 3: Generando visualizaciones")
    print("-"*80 + "\n")
    
    plot_arch_sweep_results(
        df_results=df_arch_results,
        df_wilcoxon=statistical_results['wilcoxon_results'],
        output_dir=PROJ / 'models' / '04_hybrid' / 'arch_sweep_5fold'
    )
    
    # ========================================
    # PASO 4: Identificar y ejecutar ganador
    # ========================================
    print("\n" + "="*80)
    print("PASO 4: Ejecutar arquitectura ganadora")
    print("-"*80 + "\n")
    
    winner_arch = df_arch_results.iloc[0]['arch_id']
    
    print(f"Arquitectura ganadora identificada: {winner_arch}")
    print(f"  • F1-Score: {df_arch_results.iloc[0]['mean_f1']:.4f} ± {df_arch_results.iloc[0]['std_f1']:.4f}")
    print(f"  • Accuracy: {df_arch_results.iloc[0]['mean_acc']:.4f} ± {df_arch_results.iloc[0]['std_acc']:.4f}")
    print(f"\n¿Deseas ejecutar el ganador con todos los artefactos y fine-tuning?")
    print(f"  (Los resultados ya están disponibles del sweep)")
    print(f"  Si deseas re-ejecutar, descomenta la siguiente línea:\n")
    
    # Descomenta para ejecutar el ganador con todos los artefactos
    # winner_results = run_winner_architecture(
    #     device=DEVICE,
    #     arch_id=winner_arch,
    #     output_dir=PROJ / 'models' / '04_hybrid' / 'winner_architecture'
    # )
    
    # ========================================
    # RESUMEN FINAL
    # ========================================
    print("\n" + "="*80)
    print("SWEEP ARQUITECTÓNICO COMPLETADO")
    print("="*80 + "\n")
    
    print("Resumen de Resultados:")
    print(f"  • Total arquitecturas evaluadas: {len(df_arch_results)}")
    print(f"  • Mejor arquitectura: {winner_arch}")
    print(f"  • F1-Score: {df_arch_results.iloc[0]['mean_f1']:.4f} ± {df_arch_results.iloc[0]['std_f1']:.4f}")
    
    # Contar arquitecturas significativas
    if 'wilcoxon_results' in statistical_results:
        n_significant = int(statistical_results['wilcoxon_results']['significant'].sum())
        print(f"  • Arquitecturas significativas vs baseline: {n_significant}")
    
    print(f"\nTest de Friedman:")
    print(f"  • χ² = {statistical_results['friedman_stat']:.4f}")
    print(f"  • p-value = {statistical_results['friedman_p_value']:.6f}")
    print(f"  • {'SIGNIFICATIVO' if statistical_results['friedman_significant'] else 'No significativo'}")
    
    print(f"\nArchivos generados en: {PROJ / 'models' / '04_hybrid' / 'arch_sweep_5fold'}")
    print("  • arch_sweep_5fold_results.csv - Resultados completos")
    print("  • wilcoxon_results.csv - Tests estadísticos")
    print("  • statistical_summary.json - Resumen estadístico")
    print("  • arch_sweep_comparison.png - Gráfico comparativo")
    print("  • arch_sweep_heatmaps.png - Heatmaps de configuraciones")
    print("  • arch_sweep_pvalues.png - Gráfico de p-values")
    print("  • arch_sweep_table.png - Tabla resumen")
    
    print("\n" + "="*80)
    print("ANÁLISIS COMPLETADO")
    print("="*80 + "\n")
    
    # Mostrar top 3 arquitecturas
    print("Top 3 Arquitecturas:")
    print("-"*80)
    for i in range(min(3, len(df_arch_results))):
        row = df_arch_results.iloc[i]
        print(f"{i+1}. {row['arch_id']}")
        print(f"   F1: {row['mean_f1']:.4f} ± {row['std_f1']:.4f}")
        print(f"   ACC: {row['mean_acc']:.4f} ± {row['std_acc']:.4f}")
        print(f"   Params: {int(row['n_params']):,}")
        
        # Mostrar si es significativo vs baseline
        if 'wilcoxon_results' in statistical_results:
            wilcox_row = statistical_results['wilcoxon_results'][
                statistical_results['wilcoxon_results']['arch_id'] == row['arch_id']
            ]
            if len(wilcox_row) > 0:
                is_sig = wilcox_row.iloc[0]['significant']
                p_val = wilcox_row.iloc[0]['p_value']
                print(f"   vs baseline: {'SIGNIFICATIVO' if is_sig else 'No significativo'} (p={p_val:.4f})")
        print()


SWEEP ARQUITECTÓNICO CON ANÁLISIS ESTADÍSTICO

PASO 1: Ejecutando sweep arquitectónico con 5-fold CV
--------------------------------------------------------------------------------

Configuración:
  • Base blocks: 2
  • Deltas: (-2, 0, 2)
  • Head set: (2, 4, 6)
  • Total de arquitecturas: 9 (máximo)

ADVERTENCIA: Este proceso tomará varias horas.
  Tiempo estimado: ~6-10 horas (depende del hardware)

SWEEP ARQUITECTÓNICO CON 5-FOLD CROSS-VALIDATION

Parámetros del sweep:
  • Base blocks: 2
  • Deltas: (-2, 0, 2)
  • Head set: (2, 4, 6)
  • D_MODEL: 144

Arquitecturas a evaluar: 9
Total de entrenamientos: 45 (9 archs × 5 folds)
Directorio de salida: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold


ARQUITECTURA 1/9: nb0_h2
  • Depthwise Blocks: 0
  • Attention Heads: 2
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h

Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.25it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 17.33it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.28it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1330 | train_acc=0.5256 | val_loss=0.1214 | val_acc=0.5778 | val_f1m=0.5565 | LR=0.000125
  Época   2 | train_loss=0.1240 | train_acc=0.5569 | val_loss=0.1197 | val_acc=0.5889 | val_f1m=0.5703 | LR=0.000250
  Época   3 | train_loss=0.1222 | train_acc=0.5885 | val_loss=0.1155 | val_acc=0.6206 | val_f1m=0.6051 | LR=0.000375
  Época   4 | train_loss=0.0954 | train_acc=0.7434 | val_loss=0.0993 | val_acc=0.7143 | val_f1m=0.7137 | LR=0.000500
  Época   5 | train_loss=0.0779 | train_acc=0.8038 | val_loss=0.1010 | val_acc=0.7556 | val_f1m=0.7555 | LR=0.000500
  Época   6 | train_loss=0.0784 | train_acc=0.8049 | val_loss=0.0936 | val_acc=0.7381 | val_f1m=0.7380 | LR=0.000500
  Época   7 | train_loss=0.0695 | train_acc=0.8234 | val_loss=0.0989 | val_acc=0.7460 | val_f1m=0.7460 | LR=0.000499
  Época   8 | train_loss=0.0713 | train_acc=0.8205 | val_loss=0.0982 | val_acc=0.7857 | val_f1m=0.7841



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold1/tsne_cls_fold1_nb0_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.90 ms (B=8)
✓ Fold 1 completado: ACC=0.8050, F1=0.8049

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.21it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 16.89it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 17.43it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1318 | train_acc=0.5156 | val_loss=0.1210 | val_acc=0.5587 | val_f1m=0.5275 | LR=0.000125
  Época   2 | train_loss=0.1243 | train_acc=0.5547 | val_loss=0.1185 | val_acc=0.5857 | val_f1m=0.5807 | LR=0.000250
  Época   3 | train_loss=0.1128 | train_acc=0.6365 | val_loss=0.1146 | val_acc=0.6698 | val_f1m=0.6527 | LR=0.000375
  Época   4 | train_loss=0.0966 | train_acc=0.7367 | val_loss=0.1004 | val_acc=0.7349 | val_f1m=0.7305 | LR=0.000500
  Época   5 | train_loss=0.0901 | train_acc=0.7640 | val_loss=0.0908 | val_acc=0.7508 | val_f1m=0.7508 | LR=0.000500
  Época   6 | train_loss=0.0823 | train_acc=0.7900 | val_loss=0.0868 | val_acc=0.7683 | val_f1m=0.7668 | LR=0.000500
  Época   7 | train_loss=0.0831 | train_acc=0.7864 | val_loss=0.0913 | val_acc=0.7540 | val_f1m=0.7540 | LR=0.000499
  Época   8 | train_loss=0.0830 | train_acc=0.7793 | val_loss=0.0932 | val_acc=0.7413 | val_f1m=0.7408



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold2/tsne_cls_fold2_nb0_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.91 ms (B=8)
✓ Fold 2 completado: ACC=0.8005, F1=0.8004

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 17.13it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.45it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.38it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1330 | train_acc=0.5163 | val_loss=0.1206 | val_acc=0.5825 | val_f1m=0.5804 | LR=0.000125
  Época   2 | train_loss=0.1262 | train_acc=0.5235 | val_loss=0.1183 | val_acc=0.5952 | val_f1m=0.5924 | LR=0.000250
  Época   3 | train_loss=0.1216 | train_acc=0.5682 | val_loss=0.1109 | val_acc=0.6810 | val_f1m=0.6679 | LR=0.000375
  Época   4 | train_loss=0.0986 | train_acc=0.7217 | val_loss=0.0884 | val_acc=0.7794 | val_f1m=0.7788 | LR=0.000500
  Época   5 | train_loss=0.0865 | train_acc=0.7662 | val_loss=0.0860 | val_acc=0.7905 | val_f1m=0.7903 | LR=0.000500
  Época   6 | train_loss=0.0820 | train_acc=0.7825 | val_loss=0.0825 | val_acc=0.7810 | val_f1m=0.7802 | LR=0.000500
  Época   7 | train_loss=0.0781 | train_acc=0.8006 | val_loss=0.0809 | val_acc=0.7714 | val_f1m=0.7682 | LR=0.000499
  Época   8 | train_loss=0.0781 | train_acc=0.8028 | val_loss=0.0791 | val_acc=0.7921 | val_f1m=0.7920



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold3/tsne_cls_fold3_nb0_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.90 ms (B=8)
✓ Fold 3 completado: ACC=0.7914, F1=0.7913

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:03<00:00, 17.25it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.59it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.53it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1328 | train_acc=0.5207 | val_loss=0.1272 | val_acc=0.4984 | val_f1m=0.3382 | LR=0.000125
  Época   2 | train_loss=0.1245 | train_acc=0.5483 | val_loss=0.1209 | val_acc=0.5905 | val_f1m=0.5905 | LR=0.000250
  Época   3 | train_loss=0.1183 | train_acc=0.6019 | val_loss=0.1096 | val_acc=0.6905 | val_f1m=0.6896 | LR=0.000375
  Época   4 | train_loss=0.0997 | train_acc=0.7192 | val_loss=0.1018 | val_acc=0.6937 | val_f1m=0.6856 | LR=0.000500
  Época   5 | train_loss=0.0895 | train_acc=0.7511 | val_loss=0.0984 | val_acc=0.7175 | val_f1m=0.7131 | LR=0.000500
  Época   6 | train_loss=0.0850 | train_acc=0.7752 | val_loss=0.1267 | val_acc=0.6762 | val_f1m=0.6495 | LR=0.000500
  Época   7 | train_loss=0.0800 | train_acc=0.7889 | val_loss=0.0932 | val_acc=0.7333 | val_f1m=0.7319 | LR=0.000499
  Época   8 | train_loss=0.0755 | train_acc=0.8102 | val_loss=0.0872 | val_acc=0.7730 | val_f1m=0.7723



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold4/tsne_cls_fold4_nb0_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.89 ms (B=8)
✓ Fold 4 completado: ACC=0.8060, F1=0.8039

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:03<00:00, 17.30it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 16.66it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.43it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1286 | train_acc=0.5445 | val_loss=0.1238 | val_acc=0.5460 | val_f1m=0.5283 | LR=0.000125
  Época   2 | train_loss=0.1262 | train_acc=0.5417 | val_loss=0.1176 | val_acc=0.6159 | val_f1m=0.6159 | LR=0.000250
  Época   3 | train_loss=0.1219 | train_acc=0.5970 | val_loss=0.1111 | val_acc=0.6635 | val_f1m=0.6633 | LR=0.000375
  Época   4 | train_loss=0.1014 | train_acc=0.7199 | val_loss=0.1010 | val_acc=0.6984 | val_f1m=0.6982 | LR=0.000500
  Época   5 | train_loss=0.0928 | train_acc=0.7479 | val_loss=0.1120 | val_acc=0.6714 | val_f1m=0.6446 | LR=0.000500
  Época   6 | train_loss=0.0899 | train_acc=0.7574 | val_loss=0.0962 | val_acc=0.7365 | val_f1m=0.7328 | LR=0.000500
  Época   7 | train_loss=0.0823 | train_acc=0.7910 | val_loss=0.0944 | val_acc=0.7397 | val_f1m=0.7345 | LR=0.000499
  Época   8 | train_loss=0.0817 | train_acc=0.7917 | val_loss=0.0954 | val_acc=0.7444 | val_f1m=0.7442



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h2/nb0_h2/fold5/tsne_cls_fold5_nb0_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.88 ms (B=8)
✓ Fold 5 completado: ACC=0.8417, F1=0.8417

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb0_h2:
  F1 por fold: ['0.8049', '0.8004', '0.7913', '0.8039', '0.8417']
  F1 Media ± Std: 0.8084 ± 0.0173
  ACC Media ± Std: 0.8089 ± 0.0172
  Parámetros: 205,890
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 2/9: nb0_h4
  • Depthwise Blocks: 0
  • Attention Heads: 4
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.27it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 17.31it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.48it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1315 | train_acc=0.5345 | val_loss=0.1205 | val_acc=0.5857 | val_f1m=0.5840 | LR=0.000125
  Época   2 | train_loss=0.1246 | train_acc=0.5608 | val_loss=0.1193 | val_acc=0.6016 | val_f1m=0.5931 | LR=0.000250
  Época   3 | train_loss=0.1186 | train_acc=0.6095 | val_loss=0.1063 | val_acc=0.6984 | val_f1m=0.6984 | LR=0.000375
  Época   4 | train_loss=0.0898 | train_acc=0.7544 | val_loss=0.1006 | val_acc=0.7317 | val_f1m=0.7302 | LR=0.000500
  Época   5 | train_loss=0.0735 | train_acc=0.8188 | val_loss=0.0968 | val_acc=0.7651 | val_f1m=0.7651 | LR=0.000500
  Época   6 | train_loss=0.0741 | train_acc=0.8127 | val_loss=0.0915 | val_acc=0.7413 | val_f1m=0.7406 | LR=0.000500
  Época   7 | train_loss=0.0671 | train_acc=0.8319 | val_loss=0.0980 | val_acc=0.7349 | val_f1m=0.7342 | LR=0.000499
  Época   8 | train_loss=0.0657 | train_acc=0.8404 | val_loss=0.0971 | val_acc=0.7825 | val_f1m=0.7822



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold1/tsne_cls_fold1_nb0_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.90 ms (B=8)
✓ Fold 1 completado: ACC=0.7982, F1=0.7980

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 16.78it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 16.78it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 16.93it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1299 | train_acc=0.5284 | val_loss=0.1241 | val_acc=0.5492 | val_f1m=0.4904 | LR=0.000125
  Época   2 | train_loss=0.1257 | train_acc=0.5583 | val_loss=0.1183 | val_acc=0.5952 | val_f1m=0.5803 | LR=0.000250
  Época   3 | train_loss=0.1086 | train_acc=0.6677 | val_loss=0.1082 | val_acc=0.6937 | val_f1m=0.6904 | LR=0.000375
  Época   4 | train_loss=0.0907 | train_acc=0.7552 | val_loss=0.0896 | val_acc=0.7667 | val_f1m=0.7663 | LR=0.000500
  Época   5 | train_loss=0.0823 | train_acc=0.7839 | val_loss=0.0868 | val_acc=0.7667 | val_f1m=0.7666 | LR=0.000500
  Época   6 | train_loss=0.0767 | train_acc=0.8042 | val_loss=0.0846 | val_acc=0.7667 | val_f1m=0.7658 | LR=0.000500
  Época   7 | train_loss=0.0797 | train_acc=0.7893 | val_loss=0.0853 | val_acc=0.7825 | val_f1m=0.7812 | LR=0.000499
  Época   8 | train_loss=0.0760 | train_acc=0.8063 | val_loss=0.0829 | val_acc=0.7778 | val_f1m=0.7770



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold2/tsne_cls_fold2_nb0_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.90 ms (B=8)
✓ Fold 2 completado: ACC=0.8311, F1=0.8310

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 17.27it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.53it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.39it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1321 | train_acc=0.5174 | val_loss=0.1204 | val_acc=0.5905 | val_f1m=0.5897 | LR=0.000125
  Época   2 | train_loss=0.1258 | train_acc=0.5320 | val_loss=0.1183 | val_acc=0.5952 | val_f1m=0.5941 | LR=0.000250
  Época   3 | train_loss=0.1179 | train_acc=0.5906 | val_loss=0.1052 | val_acc=0.6730 | val_f1m=0.6522 | LR=0.000375
  Época   4 | train_loss=0.0945 | train_acc=0.7484 | val_loss=0.0893 | val_acc=0.7714 | val_f1m=0.7677 | LR=0.000500
  Época   5 | train_loss=0.0796 | train_acc=0.7942 | val_loss=0.0839 | val_acc=0.7603 | val_f1m=0.7576 | LR=0.000500
  Época   6 | train_loss=0.0697 | train_acc=0.8323 | val_loss=0.0768 | val_acc=0.7889 | val_f1m=0.7889 | LR=0.000500
  Época   7 | train_loss=0.0748 | train_acc=0.8141 | val_loss=0.0789 | val_acc=0.7730 | val_f1m=0.7730 | LR=0.000499
  Época   8 | train_loss=0.0740 | train_acc=0.8149 | val_loss=0.0794 | val_acc=0.7857 | val_f1m=0.7855



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold3/tsne_cls_fold3_nb0_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.89 ms (B=8)
✓ Fold 3 completado: ACC=0.8027, F1=0.8027

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:03<00:00, 17.29it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 16.89it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 16.87it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1312 | train_acc=0.5494 | val_loss=0.1272 | val_acc=0.5000 | val_f1m=0.3546 | LR=0.000125
  Época   2 | train_loss=0.1239 | train_acc=0.5529 | val_loss=0.1200 | val_acc=0.6111 | val_f1m=0.6111 | LR=0.000250
  Época   3 | train_loss=0.1147 | train_acc=0.6194 | val_loss=0.1029 | val_acc=0.6921 | val_f1m=0.6900 | LR=0.000375
  Época   4 | train_loss=0.0959 | train_acc=0.7402 | val_loss=0.0992 | val_acc=0.7317 | val_f1m=0.7260 | LR=0.000500
  Época   5 | train_loss=0.0814 | train_acc=0.7962 | val_loss=0.0916 | val_acc=0.7365 | val_f1m=0.7328 | LR=0.000500
  Época   6 | train_loss=0.0776 | train_acc=0.8022 | val_loss=0.0886 | val_acc=0.7444 | val_f1m=0.7444 | LR=0.000500
  Época   7 | train_loss=0.0725 | train_acc=0.8200 | val_loss=0.0902 | val_acc=0.7492 | val_f1m=0.7471 | LR=0.000499
  Época   8 | train_loss=0.0735 | train_acc=0.8004 | val_loss=0.0881 | val_acc=0.7651 | val_f1m=0.7642



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold4/tsne_cls_fold4_nb0_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.91 ms (B=8)
✓ Fold 4 completado: ACC=0.8345, F1=0.8345

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:03<00:00, 17.04it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 16.64it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.11it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1299 | train_acc=0.5357 | val_loss=0.1200 | val_acc=0.5730 | val_f1m=0.5675 | LR=0.000125
  Época   2 | train_loss=0.1254 | train_acc=0.5459 | val_loss=0.1175 | val_acc=0.6175 | val_f1m=0.6174 | LR=0.000250
  Época   3 | train_loss=0.1214 | train_acc=0.6078 | val_loss=0.1107 | val_acc=0.6714 | val_f1m=0.6712 | LR=0.000375
  Época   4 | train_loss=0.0990 | train_acc=0.7328 | val_loss=0.1066 | val_acc=0.6619 | val_f1m=0.6609 | LR=0.000500
  Época   5 | train_loss=0.0849 | train_acc=0.7798 | val_loss=0.0980 | val_acc=0.7349 | val_f1m=0.7322 | LR=0.000500
  Época   6 | train_loss=0.0804 | train_acc=0.7945 | val_loss=0.0978 | val_acc=0.7413 | val_f1m=0.7381 | LR=0.000500
  Época   7 | train_loss=0.0707 | train_acc=0.8218 | val_loss=0.0921 | val_acc=0.7429 | val_f1m=0.7418 | LR=0.000499
  Época   8 | train_loss=0.0715 | train_acc=0.8186 | val_loss=0.0881 | val_acc=0.7794 | val_f1m=0.7790



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h4/nb0_h4/fold5/tsne_cls_fold5_nb0_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.90 ms (B=8)
✓ Fold 5 completado: ACC=0.8452, F1=0.8452

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb0_h4:
  F1 por fold: ['0.7980', '0.8310', '0.8027', '0.8345', '0.8452']
  F1 Media ± Std: 0.8223 ± 0.0186
  ACC Media ± Std: 0.8223 ± 0.0185
  Parámetros: 205,890
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 3/9: nb0_h6
  • Depthwise Blocks: 0
  • Attention Heads: 6
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.16it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 17.25it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.36it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1332 | train_acc=0.5348 | val_loss=0.1234 | val_acc=0.5254 | val_f1m=0.4472 | LR=0.000125
  Época   2 | train_loss=0.1251 | train_acc=0.5473 | val_loss=0.1196 | val_acc=0.6063 | val_f1m=0.5920 | LR=0.000250
  Época   3 | train_loss=0.1253 | train_acc=0.5629 | val_loss=0.1212 | val_acc=0.5429 | val_f1m=0.4905 | LR=0.000375
  Época   4 | train_loss=0.1010 | train_acc=0.7029 | val_loss=0.1000 | val_acc=0.7270 | val_f1m=0.7265 | LR=0.000500
  Época   5 | train_loss=0.0817 | train_acc=0.7882 | val_loss=0.1104 | val_acc=0.7095 | val_f1m=0.6984 | LR=0.000500
  Época   6 | train_loss=0.0717 | train_acc=0.8237 | val_loss=0.0983 | val_acc=0.7540 | val_f1m=0.7532 | LR=0.000500
  Época   7 | train_loss=0.0707 | train_acc=0.8234 | val_loss=0.0903 | val_acc=0.7698 | val_f1m=0.7698 | LR=0.000499
  Época   8 | train_loss=0.0661 | train_acc=0.8383 | val_loss=0.1004 | val_acc=0.7714 | val_f1m=0.7705



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold1/tsne_cls_fold1_nb0_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.89 ms (B=8)
✓ Fold 1 completado: ACC=0.7914, F1=0.7910

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.00it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 17.15it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 17.23it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1307 | train_acc=0.5409 | val_loss=0.1207 | val_acc=0.5683 | val_f1m=0.5509 | LR=0.000125
  Época   2 | train_loss=0.1234 | train_acc=0.5615 | val_loss=0.1186 | val_acc=0.5905 | val_f1m=0.5698 | LR=0.000250
  Época   3 | train_loss=0.1137 | train_acc=0.6365 | val_loss=0.1129 | val_acc=0.6730 | val_f1m=0.6580 | LR=0.000375
  Época   4 | train_loss=0.0958 | train_acc=0.7477 | val_loss=0.0986 | val_acc=0.7476 | val_f1m=0.7438 | LR=0.000500
  Época   5 | train_loss=0.0820 | train_acc=0.7939 | val_loss=0.0853 | val_acc=0.7746 | val_f1m=0.7742 | LR=0.000500
  Época   6 | train_loss=0.0742 | train_acc=0.8010 | val_loss=0.0817 | val_acc=0.7905 | val_f1m=0.7904 | LR=0.000500
  Época   7 | train_loss=0.0736 | train_acc=0.8056 | val_loss=0.0846 | val_acc=0.7794 | val_f1m=0.7792 | LR=0.000499
  Época   8 | train_loss=0.0737 | train_acc=0.8113 | val_loss=0.0814 | val_acc=0.7984 | val_f1m=0.7984



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold2/tsne_cls_fold2_nb0_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.90 ms (B=8)
✓ Fold 2 completado: ACC=0.8231, F1=0.8231

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 16.90it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.29it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.33it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1329 | train_acc=0.5195 | val_loss=0.1203 | val_acc=0.5857 | val_f1m=0.5855 | LR=0.000125
  Época   2 | train_loss=0.1264 | train_acc=0.5206 | val_loss=0.1187 | val_acc=0.5873 | val_f1m=0.5872 | LR=0.000250
  Época   3 | train_loss=0.1218 | train_acc=0.5604 | val_loss=0.1083 | val_acc=0.7032 | val_f1m=0.7024 | LR=0.000375
  Época   4 | train_loss=0.0965 | train_acc=0.7299 | val_loss=0.0846 | val_acc=0.7873 | val_f1m=0.7870 | LR=0.000500
  Época   5 | train_loss=0.0803 | train_acc=0.7957 | val_loss=0.0828 | val_acc=0.7778 | val_f1m=0.7761 | LR=0.000500
  Época   6 | train_loss=0.0727 | train_acc=0.8227 | val_loss=0.0779 | val_acc=0.7968 | val_f1m=0.7965 | LR=0.000500
  Época   7 | train_loss=0.0762 | train_acc=0.8117 | val_loss=0.0776 | val_acc=0.7873 | val_f1m=0.7872 | LR=0.000499
  Época   8 | train_loss=0.0713 | train_acc=0.8145 | val_loss=0.0836 | val_acc=0.7921 | val_f1m=0.7921



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold3/tsne_cls_fold3_nb0_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.90 ms (B=8)
✓ Fold 3 completado: ACC=0.7857, F1=0.7856

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:03<00:00, 17.14it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.43it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.27it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1322 | train_acc=0.5410 | val_loss=0.1270 | val_acc=0.5000 | val_f1m=0.3469 | LR=0.000125
  Época   2 | train_loss=0.1233 | train_acc=0.5557 | val_loss=0.1226 | val_acc=0.5968 | val_f1m=0.5964 | LR=0.000250
  Época   3 | train_loss=0.1168 | train_acc=0.6082 | val_loss=0.1098 | val_acc=0.6635 | val_f1m=0.6600 | LR=0.000375
  Época   4 | train_loss=0.0981 | train_acc=0.7265 | val_loss=0.0952 | val_acc=0.7063 | val_f1m=0.7039 | LR=0.000500
  Época   5 | train_loss=0.0843 | train_acc=0.7885 | val_loss=0.0935 | val_acc=0.7381 | val_f1m=0.7300 | LR=0.000500
  Época   6 | train_loss=0.0794 | train_acc=0.7920 | val_loss=0.0872 | val_acc=0.7571 | val_f1m=0.7561 | LR=0.000500
  Época   7 | train_loss=0.0730 | train_acc=0.8162 | val_loss=0.0920 | val_acc=0.7571 | val_f1m=0.7538 | LR=0.000499
  Época   8 | train_loss=0.0713 | train_acc=0.8235 | val_loss=0.0983 | val_acc=0.7460 | val_f1m=0.7392



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold4/tsne_cls_fold4_nb0_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.89 ms (B=8)
✓ Fold 4 completado: ACC=0.8298, F1=0.8295

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:03<00:00, 17.58it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.69it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.70it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1279 | train_acc=0.5231 | val_loss=0.1210 | val_acc=0.5667 | val_f1m=0.5594 | LR=0.000125
  Época   2 | train_loss=0.1267 | train_acc=0.5385 | val_loss=0.1177 | val_acc=0.6286 | val_f1m=0.6284 | LR=0.000250
  Época   3 | train_loss=0.1236 | train_acc=0.5816 | val_loss=0.1129 | val_acc=0.6397 | val_f1m=0.6396 | LR=0.000375
  Época   4 | train_loss=0.1004 | train_acc=0.7181 | val_loss=0.1023 | val_acc=0.7143 | val_f1m=0.7130 | LR=0.000500
  Época   5 | train_loss=0.0839 | train_acc=0.7854 | val_loss=0.0987 | val_acc=0.7270 | val_f1m=0.7199 | LR=0.000500
  Época   6 | train_loss=0.0779 | train_acc=0.7945 | val_loss=0.0899 | val_acc=0.7508 | val_f1m=0.7506 | LR=0.000500
  Época   7 | train_loss=0.0715 | train_acc=0.8148 | val_loss=0.0902 | val_acc=0.7540 | val_f1m=0.7538 | LR=0.000499
  Época   8 | train_loss=0.0708 | train_acc=0.8109 | val_loss=0.0916 | val_acc=0.7587 | val_f1m=0.7587



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb0_h6/nb0_h6/fold5/tsne_cls_fold5_nb0_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=205,890 | params_trainable=205,890 | FLOPs~146.4M | latency/batch=0.88 ms (B=8)
✓ Fold 5 completado: ACC=0.8345, F1=0.8344

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb0_h6:
  F1 por fold: ['0.7910', '0.8231', '0.7856', '0.8295', '0.8344']
  F1 Media ± Std: 0.8127 ± 0.0203
  ACC Media ± Std: 0.8129 ± 0.0203
  Parámetros: 205,890
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 4/9: nb2_h2
  • Depthwise Blocks: 2
  • Attention Heads: 2
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.26it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 17.65it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.46it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1307 | train_acc=0.5370 | val_loss=0.1257 | val_acc=0.5000 | val_f1m=0.3333 | LR=0.000125
  Época   2 | train_loss=0.1226 | train_acc=0.5601 | val_loss=0.1171 | val_acc=0.6349 | val_f1m=0.6349 | LR=0.000250
  Época   3 | train_loss=0.1040 | train_acc=0.6997 | val_loss=0.1068 | val_acc=0.6857 | val_f1m=0.6847 | LR=0.000375
  Época   4 | train_loss=0.0964 | train_acc=0.7491 | val_loss=0.1090 | val_acc=0.6968 | val_f1m=0.6931 | LR=0.000500
  Época   5 | train_loss=0.0837 | train_acc=0.7914 | val_loss=0.0992 | val_acc=0.7286 | val_f1m=0.7285 | LR=0.000500
  Época   6 | train_loss=0.0770 | train_acc=0.8053 | val_loss=0.0983 | val_acc=0.7460 | val_f1m=0.7449 | LR=0.000500
  Época   7 | train_loss=0.0762 | train_acc=0.8092 | val_loss=0.0957 | val_acc=0.7381 | val_f1m=0.7364 | LR=0.000499
  Época   8 | train_loss=0.0694 | train_acc=0.8337 | val_loss=0.1078 | val_acc=0.7524 | val_f1m=0.7514



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold1/tsne_cls_fold1_nb2_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 1 completado: ACC=0.7732, F1=0.7731

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.01it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 17.01it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 17.34it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1295 | train_acc=0.5320 | val_loss=0.1203 | val_acc=0.6222 | val_f1m=0.6175 | LR=0.000125
  Época   2 | train_loss=0.1221 | train_acc=0.5810 | val_loss=0.1143 | val_acc=0.6571 | val_f1m=0.6551 | LR=0.000250
  Época   3 | train_loss=0.1100 | train_acc=0.6802 | val_loss=0.1101 | val_acc=0.6794 | val_f1m=0.6751 | LR=0.000375
  Época   4 | train_loss=0.0944 | train_acc=0.7427 | val_loss=0.0939 | val_acc=0.7317 | val_f1m=0.7317 | LR=0.000500
  Época   5 | train_loss=0.0972 | train_acc=0.7296 | val_loss=0.0977 | val_acc=0.7349 | val_f1m=0.7319 | LR=0.000500
  Época   6 | train_loss=0.0828 | train_acc=0.7896 | val_loss=0.0969 | val_acc=0.7492 | val_f1m=0.7479 | LR=0.000500
  Época   7 | train_loss=0.0800 | train_acc=0.7957 | val_loss=0.0916 | val_acc=0.7508 | val_f1m=0.7474 | LR=0.000499
  Época   8 | train_loss=0.0813 | train_acc=0.7811 | val_loss=0.0969 | val_acc=0.7619 | val_f1m=0.7614



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold2/tsne_cls_fold2_nb2_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 2 completado: ACC=0.8118, F1=0.8118

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 16.98it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.18it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.66it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1322 | train_acc=0.4940 | val_loss=0.1251 | val_acc=0.5190 | val_f1m=0.3975 | LR=0.000125
  Época   2 | train_loss=0.1237 | train_acc=0.5519 | val_loss=0.1294 | val_acc=0.5127 | val_f1m=0.3686 | LR=0.000250
  Época   3 | train_loss=0.1208 | train_acc=0.5885 | val_loss=0.1047 | val_acc=0.7032 | val_f1m=0.6971 | LR=0.000375
  Época   4 | train_loss=0.0976 | train_acc=0.7310 | val_loss=0.0928 | val_acc=0.7429 | val_f1m=0.7386 | LR=0.000500
  Época   5 | train_loss=0.0936 | train_acc=0.7445 | val_loss=0.1052 | val_acc=0.7190 | val_f1m=0.7064 | LR=0.000500
  Época   6 | train_loss=0.0871 | train_acc=0.7850 | val_loss=0.0869 | val_acc=0.7667 | val_f1m=0.7635 | LR=0.000500
  Época   7 | train_loss=0.0793 | train_acc=0.7964 | val_loss=0.0861 | val_acc=0.7635 | val_f1m=0.7591 | LR=0.000499
  Época   8 | train_loss=0.0803 | train_acc=0.7868 | val_loss=0.0788 | val_acc=0.8032 | val_f1m=0.8032



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold3/tsne_cls_fold3_nb2_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.08 ms (B=8)
✓ Fold 3 completado: ACC=0.7914, F1=0.7914

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:04<00:00, 16.98it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.01it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.16it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1320 | train_acc=0.5161 | val_loss=0.1202 | val_acc=0.6143 | val_f1m=0.6128 | LR=0.000125
  Época   2 | train_loss=0.1274 | train_acc=0.5399 | val_loss=0.1186 | val_acc=0.6159 | val_f1m=0.5977 | LR=0.000250
  Época   3 | train_loss=0.1166 | train_acc=0.6085 | val_loss=0.1084 | val_acc=0.6952 | val_f1m=0.6952 | LR=0.000375
  Época   4 | train_loss=0.0964 | train_acc=0.7360 | val_loss=0.1023 | val_acc=0.7063 | val_f1m=0.7054 | LR=0.000500
  Época   5 | train_loss=0.0903 | train_acc=0.7640 | val_loss=0.1002 | val_acc=0.7238 | val_f1m=0.7207 | LR=0.000500
  Época   6 | train_loss=0.0908 | train_acc=0.7612 | val_loss=0.0976 | val_acc=0.7492 | val_f1m=0.7482 | LR=0.000500
  Época   7 | train_loss=0.0793 | train_acc=0.8060 | val_loss=0.1024 | val_acc=0.7190 | val_f1m=0.7169 | LR=0.000499
  Época   8 | train_loss=0.0834 | train_acc=0.7892 | val_loss=0.1050 | val_acc=0.7333 | val_f1m=0.7240



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold4/tsne_cls_fold4_nb2_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 4 completado: ACC=0.8321, F1=0.8321

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:04<00:00, 16.99it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.61it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.55it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1299 | train_acc=0.5245 | val_loss=0.1210 | val_acc=0.5333 | val_f1m=0.4672 | LR=0.000125
  Época   2 | train_loss=0.1255 | train_acc=0.5413 | val_loss=0.1189 | val_acc=0.5762 | val_f1m=0.5226 | LR=0.000250
  Época   3 | train_loss=0.1207 | train_acc=0.5837 | val_loss=0.1096 | val_acc=0.6667 | val_f1m=0.6654 | LR=0.000375
  Época   4 | train_loss=0.0964 | train_acc=0.7433 | val_loss=0.1062 | val_acc=0.6984 | val_f1m=0.6984 | LR=0.000500
  Época   5 | train_loss=0.0965 | train_acc=0.7279 | val_loss=0.0998 | val_acc=0.7302 | val_f1m=0.7273 | LR=0.000500
  Época   6 | train_loss=0.0834 | train_acc=0.7829 | val_loss=0.1057 | val_acc=0.7143 | val_f1m=0.7098 | LR=0.000500
  Época   7 | train_loss=0.0882 | train_acc=0.7728 | val_loss=0.0925 | val_acc=0.7460 | val_f1m=0.7458 | LR=0.000499
  Época   8 | train_loss=0.0818 | train_acc=0.7805 | val_loss=0.0914 | val_acc=0.7508 | val_f1m=0.7508



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h2/nb2_h2/fold5/tsne_cls_fold5_nb2_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.08 ms (B=8)
✓ Fold 5 completado: ACC=0.8310, F1=0.8309

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb2_h2:
  F1 por fold: ['0.7731', '0.8118', '0.7914', '0.8321', '0.8309']
  F1 Media ± Std: 0.8078 ± 0.0229
  ACC Media ± Std: 0.8079 ± 0.0228
  Parámetros: 232,290
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 5/9: nb2_h4
  • Depthwise Blocks: 2
  • Attention Heads: 4
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 16.97it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 16.85it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.47it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1315 | train_acc=0.5131 | val_loss=0.1242 | val_acc=0.4984 | val_f1m=0.3382 | LR=0.000125
  Época   2 | train_loss=0.1229 | train_acc=0.5579 | val_loss=0.1184 | val_acc=0.6127 | val_f1m=0.6113 | LR=0.000250
  Época   3 | train_loss=0.1105 | train_acc=0.6578 | val_loss=0.1202 | val_acc=0.6429 | val_f1m=0.6192 | LR=0.000375
  Época   4 | train_loss=0.0973 | train_acc=0.7406 | val_loss=0.1010 | val_acc=0.7190 | val_f1m=0.7190 | LR=0.000500
  Época   5 | train_loss=0.0843 | train_acc=0.7818 | val_loss=0.0964 | val_acc=0.7317 | val_f1m=0.7310 | LR=0.000500
  Época   6 | train_loss=0.0754 | train_acc=0.8099 | val_loss=0.1018 | val_acc=0.7413 | val_f1m=0.7377 | LR=0.000500
  Época   7 | train_loss=0.0747 | train_acc=0.8145 | val_loss=0.0987 | val_acc=0.7444 | val_f1m=0.7415 | LR=0.000499
  Época   8 | train_loss=0.0756 | train_acc=0.8070 | val_loss=0.1114 | val_acc=0.7270 | val_f1m=0.7202



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold1/tsne_cls_fold1_nb2_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 1 completado: ACC=0.7789, F1=0.7788

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.07it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 17.44it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 17.40it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1301 | train_acc=0.5192 | val_loss=0.1195 | val_acc=0.5857 | val_f1m=0.5856 | LR=0.000125
  Época   2 | train_loss=0.1216 | train_acc=0.5810 | val_loss=0.1160 | val_acc=0.6333 | val_f1m=0.6322 | LR=0.000250
  Época   3 | train_loss=0.1133 | train_acc=0.6443 | val_loss=0.1080 | val_acc=0.6921 | val_f1m=0.6918 | LR=0.000375
  Época   4 | train_loss=0.0965 | train_acc=0.7299 | val_loss=0.0964 | val_acc=0.7302 | val_f1m=0.7298 | LR=0.000500
  Época   5 | train_loss=0.0911 | train_acc=0.7541 | val_loss=0.0947 | val_acc=0.7476 | val_f1m=0.7407 | LR=0.000500
  Época   6 | train_loss=0.0854 | train_acc=0.7743 | val_loss=0.0930 | val_acc=0.7333 | val_f1m=0.7319 | LR=0.000500
  Época   7 | train_loss=0.0810 | train_acc=0.7868 | val_loss=0.0913 | val_acc=0.7587 | val_f1m=0.7557 | LR=0.000499
  Época   8 | train_loss=0.0800 | train_acc=0.7953 | val_loss=0.0957 | val_acc=0.7524 | val_f1m=0.7509



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold2/tsne_cls_fold2_nb2_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 2 completado: ACC=0.8186, F1=0.8186

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 17.22it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.20it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.37it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1326 | train_acc=0.4964 | val_loss=0.1236 | val_acc=0.5079 | val_f1m=0.3802 | LR=0.000125
  Época   2 | train_loss=0.1236 | train_acc=0.5601 | val_loss=0.1317 | val_acc=0.5095 | val_f1m=0.3568 | LR=0.000250
  Época   3 | train_loss=0.1267 | train_acc=0.5601 | val_loss=0.1094 | val_acc=0.6778 | val_f1m=0.6718 | LR=0.000375
  Época   4 | train_loss=0.1074 | train_acc=0.6791 | val_loss=0.0976 | val_acc=0.7190 | val_f1m=0.7107 | LR=0.000500
  Época   5 | train_loss=0.0946 | train_acc=0.7392 | val_loss=0.1040 | val_acc=0.7095 | val_f1m=0.6964 | LR=0.000500
  Época   6 | train_loss=0.0866 | train_acc=0.7747 | val_loss=0.0949 | val_acc=0.7365 | val_f1m=0.7279 | LR=0.000500
  Época   7 | train_loss=0.0781 | train_acc=0.7989 | val_loss=0.0873 | val_acc=0.7778 | val_f1m=0.7775 | LR=0.000499
  Época   8 | train_loss=0.0726 | train_acc=0.8159 | val_loss=0.0922 | val_acc=0.7651 | val_f1m=0.7635



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold3/tsne_cls_fold3_nb2_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 3 completado: ACC=0.7857, F1=0.7856

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:03<00:00, 17.22it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.44it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.25it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1317 | train_acc=0.5140 | val_loss=0.1205 | val_acc=0.5889 | val_f1m=0.5842 | LR=0.000125
  Época   2 | train_loss=0.1263 | train_acc=0.5536 | val_loss=0.1202 | val_acc=0.5635 | val_f1m=0.5022 | LR=0.000250
  Época   3 | train_loss=0.1184 | train_acc=0.5921 | val_loss=0.1118 | val_acc=0.6810 | val_f1m=0.6807 | LR=0.000375
  Época   4 | train_loss=0.0968 | train_acc=0.7339 | val_loss=0.0994 | val_acc=0.7095 | val_f1m=0.7095 | LR=0.000500
  Época   5 | train_loss=0.0867 | train_acc=0.7689 | val_loss=0.0987 | val_acc=0.7286 | val_f1m=0.7242 | LR=0.000500
  Época   6 | train_loss=0.0861 | train_acc=0.7700 | val_loss=0.0881 | val_acc=0.7556 | val_f1m=0.7556 | LR=0.000500
  Época   7 | train_loss=0.0764 | train_acc=0.8099 | val_loss=0.0904 | val_acc=0.7810 | val_f1m=0.7808 | LR=0.000499
  Época   8 | train_loss=0.0781 | train_acc=0.8001 | val_loss=0.0843 | val_acc=0.7778 | val_f1m=0.7775



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold4/tsne_cls_fold4_nb2_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 4 completado: ACC=0.8143, F1=0.8141

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:03<00:00, 17.37it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.50it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.52it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1318 | train_acc=0.5140 | val_loss=0.1230 | val_acc=0.5143 | val_f1m=0.4089 | LR=0.000125
  Época   2 | train_loss=0.1256 | train_acc=0.5431 | val_loss=0.1191 | val_acc=0.5794 | val_f1m=0.5215 | LR=0.000250
  Época   3 | train_loss=0.1221 | train_acc=0.5812 | val_loss=0.1127 | val_acc=0.6635 | val_f1m=0.6513 | LR=0.000375
  Época   4 | train_loss=0.0983 | train_acc=0.7346 | val_loss=0.1069 | val_acc=0.6873 | val_f1m=0.6872 | LR=0.000500
  Época   5 | train_loss=0.0921 | train_acc=0.7560 | val_loss=0.0951 | val_acc=0.7333 | val_f1m=0.7333 | LR=0.000500
  Época   6 | train_loss=0.0811 | train_acc=0.7913 | val_loss=0.0974 | val_acc=0.7222 | val_f1m=0.7212 | LR=0.000500
  Época   7 | train_loss=0.0818 | train_acc=0.7910 | val_loss=0.0950 | val_acc=0.7222 | val_f1m=0.7222 | LR=0.000499
  Época   8 | train_loss=0.0751 | train_acc=0.8165 | val_loss=0.0963 | val_acc=0.7095 | val_f1m=0.7068



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h4/nb2_h4/fold5/tsne_cls_fold5_nb2_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.08 ms (B=8)
✓ Fold 5 completado: ACC=0.8405, F1=0.8405

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb2_h4:
  F1 por fold: ['0.7788', '0.8186', '0.7856', '0.8141', '0.8405']
  F1 Media ± Std: 0.8075 ± 0.0226
  ACC Media ± Std: 0.8076 ± 0.0226
  Parámetros: 232,290
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 6/9: nb2_h6
  • Depthwise Blocks: 2
  • Attention Heads: 6
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.34it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 16.92it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.25it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1318 | train_acc=0.5213 | val_loss=0.1249 | val_acc=0.5000 | val_f1m=0.3443 | LR=0.000125
  Época   2 | train_loss=0.1224 | train_acc=0.5729 | val_loss=0.1187 | val_acc=0.6190 | val_f1m=0.6166 | LR=0.000250
  Época   3 | train_loss=0.1102 | train_acc=0.6546 | val_loss=0.1111 | val_acc=0.6508 | val_f1m=0.6402 | LR=0.000375
  Época   4 | train_loss=0.0962 | train_acc=0.7416 | val_loss=0.1044 | val_acc=0.7206 | val_f1m=0.7198 | LR=0.000500
  Época   5 | train_loss=0.0841 | train_acc=0.7839 | val_loss=0.0987 | val_acc=0.7444 | val_f1m=0.7439 | LR=0.000500
  Época   6 | train_loss=0.0764 | train_acc=0.8003 | val_loss=0.0960 | val_acc=0.7556 | val_f1m=0.7539 | LR=0.000500
  Época   7 | train_loss=0.0721 | train_acc=0.8124 | val_loss=0.0919 | val_acc=0.7635 | val_f1m=0.7627 | LR=0.000499
  Época   8 | train_loss=0.0662 | train_acc=0.8348 | val_loss=0.1250 | val_acc=0.7190 | val_f1m=0.7119



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold1/tsne_cls_fold1_nb2_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.11 ms (B=8)
✓ Fold 1 completado: ACC=0.8107, F1=0.8107

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.51it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 17.58it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 17.37it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1290 | train_acc=0.5267 | val_loss=0.1202 | val_acc=0.6032 | val_f1m=0.6029 | LR=0.000125
  Época   2 | train_loss=0.1214 | train_acc=0.5839 | val_loss=0.1159 | val_acc=0.6222 | val_f1m=0.6183 | LR=0.000250
  Época   3 | train_loss=0.1168 | train_acc=0.6276 | val_loss=0.1052 | val_acc=0.7095 | val_f1m=0.7095 | LR=0.000375
  Época   4 | train_loss=0.0987 | train_acc=0.7278 | val_loss=0.0958 | val_acc=0.7206 | val_f1m=0.7202 | LR=0.000500
  Época   5 | train_loss=0.0910 | train_acc=0.7594 | val_loss=0.0973 | val_acc=0.7175 | val_f1m=0.7028 | LR=0.000500
  Época   6 | train_loss=0.0808 | train_acc=0.7921 | val_loss=0.0942 | val_acc=0.7492 | val_f1m=0.7492 | LR=0.000500
  Época   7 | train_loss=0.0776 | train_acc=0.8042 | val_loss=0.0910 | val_acc=0.7476 | val_f1m=0.7417 | LR=0.000499
  Época   8 | train_loss=0.0771 | train_acc=0.8056 | val_loss=0.0823 | val_acc=0.7873 | val_f1m=0.7870



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold2/tsne_cls_fold2_nb2_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 2 completado: ACC=0.8356, F1=0.8353

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 17.38it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.57it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.38it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1315 | train_acc=0.5174 | val_loss=0.1267 | val_acc=0.5079 | val_f1m=0.3685 | LR=0.000125
  Época   2 | train_loss=0.1251 | train_acc=0.5402 | val_loss=0.1324 | val_acc=0.5190 | val_f1m=0.3840 | LR=0.000250
  Época   3 | train_loss=0.1242 | train_acc=0.5547 | val_loss=0.1109 | val_acc=0.6810 | val_f1m=0.6809 | LR=0.000375
  Época   4 | train_loss=0.1067 | train_acc=0.6844 | val_loss=0.0947 | val_acc=0.7222 | val_f1m=0.7221 | LR=0.000500
  Época   5 | train_loss=0.0889 | train_acc=0.7633 | val_loss=0.0884 | val_acc=0.7651 | val_f1m=0.7621 | LR=0.000500
  Época   6 | train_loss=0.0765 | train_acc=0.7974 | val_loss=0.0827 | val_acc=0.8016 | val_f1m=0.8015 | LR=0.000500
  Época   7 | train_loss=0.0721 | train_acc=0.8259 | val_loss=0.0834 | val_acc=0.7937 | val_f1m=0.7936 | LR=0.000499
  Época   8 | train_loss=0.0738 | train_acc=0.8223 | val_loss=0.0810 | val_acc=0.8063 | val_f1m=0.8063



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold3/tsne_cls_fold3_nb2_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.10 ms (B=8)
✓ Fold 3 completado: ACC=0.8084, F1=0.8084

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:04<00:00, 16.91it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.49it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.38it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1308 | train_acc=0.4972 | val_loss=0.1204 | val_acc=0.5651 | val_f1m=0.5476 | LR=0.000125
  Época   2 | train_loss=0.1254 | train_acc=0.5501 | val_loss=0.1177 | val_acc=0.6079 | val_f1m=0.6056 | LR=0.000250
  Época   3 | train_loss=0.1174 | train_acc=0.6008 | val_loss=0.1263 | val_acc=0.6508 | val_f1m=0.6418 | LR=0.000375
  Época   4 | train_loss=0.1039 | train_acc=0.6992 | val_loss=0.1061 | val_acc=0.6889 | val_f1m=0.6888 | LR=0.000500
  Época   5 | train_loss=0.0898 | train_acc=0.7626 | val_loss=0.1008 | val_acc=0.7349 | val_f1m=0.7327 | LR=0.000500
  Época   6 | train_loss=0.0802 | train_acc=0.7805 | val_loss=0.0876 | val_acc=0.7635 | val_f1m=0.7622 | LR=0.000500
  Época   7 | train_loss=0.0810 | train_acc=0.7920 | val_loss=0.0990 | val_acc=0.7286 | val_f1m=0.7182 | LR=0.000499
  Época   8 | train_loss=0.0737 | train_acc=0.8099 | val_loss=0.0873 | val_acc=0.7762 | val_f1m=0.7749



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold4/tsne_cls_fold4_nb2_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 4 completado: ACC=0.8119, F1=0.8098

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:03<00:00, 17.55it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.62it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.62it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1320 | train_acc=0.5035 | val_loss=0.1220 | val_acc=0.5095 | val_f1m=0.4079 | LR=0.000125
  Época   2 | train_loss=0.1241 | train_acc=0.5445 | val_loss=0.1177 | val_acc=0.5889 | val_f1m=0.5691 | LR=0.000250
  Época   3 | train_loss=0.1225 | train_acc=0.5795 | val_loss=0.1122 | val_acc=0.6619 | val_f1m=0.6540 | LR=0.000375
  Época   4 | train_loss=0.0968 | train_acc=0.7416 | val_loss=0.1055 | val_acc=0.7000 | val_f1m=0.6988 | LR=0.000500
  Época   5 | train_loss=0.0907 | train_acc=0.7560 | val_loss=0.1002 | val_acc=0.7127 | val_f1m=0.7107 | LR=0.000500
  Época   6 | train_loss=0.0747 | train_acc=0.8123 | val_loss=0.0961 | val_acc=0.7190 | val_f1m=0.7162 | LR=0.000500
  Época   7 | train_loss=0.0737 | train_acc=0.8200 | val_loss=0.1029 | val_acc=0.7286 | val_f1m=0.7273 | LR=0.000499
  Época   8 | train_loss=0.0708 | train_acc=0.8232 | val_loss=0.1032 | val_acc=0.7333 | val_f1m=0.7278



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb2_h6/nb2_h6/fold5/tsne_cls_fold5_nb2_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=232,290 | params_trainable=232,290 | FLOPs~24.3M | latency/batch=1.09 ms (B=8)
✓ Fold 5 completado: ACC=0.8464, F1=0.8464

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb2_h6:
  F1 por fold: ['0.8107', '0.8353', '0.8084', '0.8098', '0.8464']
  F1 Media ± Std: 0.8221 ± 0.0157
  ACC Media ± Std: 0.8226 ± 0.0155
  Parámetros: 232,290
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 7/9: nb4_h2
  • Depthwise Blocks: 4
  • Attention Heads: 2
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.54it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 17.38it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.47it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1301 | train_acc=0.5238 | val_loss=0.1230 | val_acc=0.5365 | val_f1m=0.4667 | LR=0.000125
  Época   2 | train_loss=0.1240 | train_acc=0.5533 | val_loss=0.1191 | val_acc=0.5635 | val_f1m=0.5183 | LR=0.000250
  Época   3 | train_loss=0.1123 | train_acc=0.6525 | val_loss=0.1264 | val_acc=0.6540 | val_f1m=0.6376 | LR=0.000375
  Época   4 | train_loss=0.0929 | train_acc=0.7555 | val_loss=0.1088 | val_acc=0.6889 | val_f1m=0.6864 | LR=0.000500
  Época   5 | train_loss=0.0843 | train_acc=0.7907 | val_loss=0.1054 | val_acc=0.7111 | val_f1m=0.7110 | LR=0.000500
  Época   6 | train_loss=0.0825 | train_acc=0.7857 | val_loss=0.1028 | val_acc=0.7111 | val_f1m=0.7096 | LR=0.000500
  Época   7 | train_loss=0.0787 | train_acc=0.7985 | val_loss=0.1051 | val_acc=0.7111 | val_f1m=0.7077 | LR=0.000499
  Época   8 | train_loss=0.0803 | train_acc=0.7921 | val_loss=0.1051 | val_acc=0.7190 | val_f1m=0.7184



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold1/tsne_cls_fold1_nb4_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 1 completado: ACC=0.7732, F1=0.7730

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.13it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 17.30it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 17.18it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1309 | train_acc=0.5131 | val_loss=0.1226 | val_acc=0.5698 | val_f1m=0.5017 | LR=0.000125
  Época   2 | train_loss=0.1230 | train_acc=0.5551 | val_loss=0.1170 | val_acc=0.6063 | val_f1m=0.6057 | LR=0.000250
  Época   3 | train_loss=0.1117 | train_acc=0.6429 | val_loss=0.1003 | val_acc=0.7032 | val_f1m=0.7031 | LR=0.000375
  Época   4 | train_loss=0.0965 | train_acc=0.7370 | val_loss=0.1146 | val_acc=0.7127 | val_f1m=0.7111 | LR=0.000500
  Época   5 | train_loss=0.0941 | train_acc=0.7463 | val_loss=0.1054 | val_acc=0.7460 | val_f1m=0.7455 | LR=0.000500
  Época   6 | train_loss=0.0851 | train_acc=0.7811 | val_loss=0.1066 | val_acc=0.7095 | val_f1m=0.6999 | LR=0.000500
  Época   7 | train_loss=0.0868 | train_acc=0.7768 | val_loss=0.0928 | val_acc=0.7476 | val_f1m=0.7476 | LR=0.000499
  Época   8 | train_loss=0.0802 | train_acc=0.7910 | val_loss=0.0907 | val_acc=0.7556 | val_f1m=0.7555



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold2/tsne_cls_fold2_nb4_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 2 completado: ACC=0.8186, F1=0.8186

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 16.98it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 16.88it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 16.71it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1318 | train_acc=0.5117 | val_loss=0.1207 | val_acc=0.5413 | val_f1m=0.4742 | LR=0.000125
  Época   2 | train_loss=0.1241 | train_acc=0.5547 | val_loss=0.1170 | val_acc=0.6175 | val_f1m=0.6169 | LR=0.000250
  Época   3 | train_loss=0.1249 | train_acc=0.5458 | val_loss=0.1141 | val_acc=0.6429 | val_f1m=0.6305 | LR=0.000375
  Época   4 | train_loss=0.1021 | train_acc=0.7225 | val_loss=0.0923 | val_acc=0.7460 | val_f1m=0.7452 | LR=0.000500
  Época   5 | train_loss=0.0902 | train_acc=0.7576 | val_loss=0.0867 | val_acc=0.7730 | val_f1m=0.7713 | LR=0.000500
  Época   6 | train_loss=0.0811 | train_acc=0.7878 | val_loss=0.0897 | val_acc=0.7540 | val_f1m=0.7480 | LR=0.000500
  Época   7 | train_loss=0.0789 | train_acc=0.7978 | val_loss=0.0807 | val_acc=0.7873 | val_f1m=0.7873 | LR=0.000499
  Época   8 | train_loss=0.0752 | train_acc=0.8006 | val_loss=0.0774 | val_acc=0.7984 | val_f1m=0.7978



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold3/tsne_cls_fold3_nb4_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 3 completado: ACC=0.8005, F1=0.8004

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:03<00:00, 17.06it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.54it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.11it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1309 | train_acc=0.4961 | val_loss=0.1235 | val_acc=0.5222 | val_f1m=0.4304 | LR=0.000125
  Época   2 | train_loss=0.1247 | train_acc=0.5469 | val_loss=0.1202 | val_acc=0.6000 | val_f1m=0.5934 | LR=0.000250
  Época   3 | train_loss=0.1156 | train_acc=0.6254 | val_loss=0.1119 | val_acc=0.6794 | val_f1m=0.6760 | LR=0.000375
  Época   4 | train_loss=0.1005 | train_acc=0.7185 | val_loss=0.0992 | val_acc=0.7222 | val_f1m=0.7220 | LR=0.000500
  Época   5 | train_loss=0.0970 | train_acc=0.7251 | val_loss=0.1041 | val_acc=0.6952 | val_f1m=0.6950 | LR=0.000500
  Época   6 | train_loss=0.0979 | train_acc=0.7290 | val_loss=0.1153 | val_acc=0.6905 | val_f1m=0.6836 | LR=0.000500
  Época   7 | train_loss=0.0949 | train_acc=0.7482 | val_loss=0.1034 | val_acc=0.7159 | val_f1m=0.7157 | LR=0.000499
  Época   8 | train_loss=0.0845 | train_acc=0.7780 | val_loss=0.1098 | val_acc=0.7159 | val_f1m=0.7143



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold4/tsne_cls_fold4_nb4_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 4 completado: ACC=0.8274, F1=0.8274

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:04<00:00, 16.79it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.00it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 16.39it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1320 | train_acc=0.5126 | val_loss=0.1220 | val_acc=0.5413 | val_f1m=0.4821 | LR=0.000125
  Época   2 | train_loss=0.1255 | train_acc=0.5315 | val_loss=0.1184 | val_acc=0.5984 | val_f1m=0.5739 | LR=0.000250
  Época   3 | train_loss=0.1186 | train_acc=0.6096 | val_loss=0.1076 | val_acc=0.6857 | val_f1m=0.6773 | LR=0.000375
  Época   4 | train_loss=0.1003 | train_acc=0.7167 | val_loss=0.1069 | val_acc=0.6952 | val_f1m=0.6900 | LR=0.000500
  Época   5 | train_loss=0.0875 | train_acc=0.7689 | val_loss=0.0949 | val_acc=0.7238 | val_f1m=0.7235 | LR=0.000500
  Época   6 | train_loss=0.0793 | train_acc=0.7959 | val_loss=0.1087 | val_acc=0.7270 | val_f1m=0.7218 | LR=0.000500
  Época   7 | train_loss=0.0787 | train_acc=0.8088 | val_loss=0.1016 | val_acc=0.7508 | val_f1m=0.7487 | LR=0.000499
  Época   8 | train_loss=0.0778 | train_acc=0.7973 | val_loss=0.0967 | val_acc=0.7381 | val_f1m=0.7381



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h2/nb4_h2/fold5/tsne_cls_fold5_nb4_h2.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 5 completado: ACC=0.8214, F1=0.8214

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb4_h2:
  F1 por fold: ['0.7730', '0.8186', '0.8004', '0.8274', '0.8214']
  F1 Media ± Std: 0.8082 ± 0.0197
  ACC Media ± Std: 0.8082 ± 0.0197
  Parámetros: 352,738
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 8/9: nb4_h4
  • Depthwise Blocks: 4
  • Attention Heads: 4
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.35it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 17.34it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.24it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1299 | train_acc=0.5210 | val_loss=0.1231 | val_acc=0.5349 | val_f1m=0.4597 | LR=0.000125
  Época   2 | train_loss=0.1239 | train_acc=0.5522 | val_loss=0.1197 | val_acc=0.5540 | val_f1m=0.4988 | LR=0.000250
  Época   3 | train_loss=0.1148 | train_acc=0.6294 | val_loss=0.1302 | val_acc=0.6317 | val_f1m=0.6077 | LR=0.000375
  Época   4 | train_loss=0.0940 | train_acc=0.7498 | val_loss=0.1042 | val_acc=0.6952 | val_f1m=0.6951 | LR=0.000500
  Época   5 | train_loss=0.0824 | train_acc=0.7928 | val_loss=0.1038 | val_acc=0.7127 | val_f1m=0.7126 | LR=0.000500
  Época   6 | train_loss=0.0845 | train_acc=0.7886 | val_loss=0.1017 | val_acc=0.7063 | val_f1m=0.7053 | LR=0.000500
  Época   7 | train_loss=0.0775 | train_acc=0.8035 | val_loss=0.1191 | val_acc=0.7063 | val_f1m=0.6980 | LR=0.000499
  Época   8 | train_loss=0.0750 | train_acc=0.8156 | val_loss=0.1026 | val_acc=0.7286 | val_f1m=0.7268



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold1/tsne_cls_fold1_nb4_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.43 ms (B=8)
✓ Fold 1 completado: ACC=0.7687, F1=0.7686

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 17.04it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 16.87it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 16.92it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1305 | train_acc=0.5103 | val_loss=0.1227 | val_acc=0.5603 | val_f1m=0.4805 | LR=0.000125
  Época   2 | train_loss=0.1231 | train_acc=0.5615 | val_loss=0.1181 | val_acc=0.6143 | val_f1m=0.6139 | LR=0.000250
  Época   3 | train_loss=0.1140 | train_acc=0.6297 | val_loss=0.1017 | val_acc=0.7079 | val_f1m=0.7079 | LR=0.000375
  Época   4 | train_loss=0.0963 | train_acc=0.7402 | val_loss=0.1192 | val_acc=0.6952 | val_f1m=0.6924 | LR=0.000500
  Época   5 | train_loss=0.0900 | train_acc=0.7591 | val_loss=0.1034 | val_acc=0.7159 | val_f1m=0.7156 | LR=0.000500
  Época   6 | train_loss=0.0857 | train_acc=0.7708 | val_loss=0.0919 | val_acc=0.7444 | val_f1m=0.7442 | LR=0.000500
  Época   7 | train_loss=0.0793 | train_acc=0.7974 | val_loss=0.0911 | val_acc=0.7540 | val_f1m=0.7538 | LR=0.000499
  Época   8 | train_loss=0.0750 | train_acc=0.8031 | val_loss=0.1023 | val_acc=0.7635 | val_f1m=0.7635



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold2/tsne_cls_fold2_nb4_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.28 ms (B=8)
✓ Fold 2 completado: ACC=0.8277, F1=0.8276

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 17.10it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.36it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.32it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1320 | train_acc=0.5064 | val_loss=0.1205 | val_acc=0.5492 | val_f1m=0.4840 | LR=0.000125
  Época   2 | train_loss=0.1242 | train_acc=0.5533 | val_loss=0.1183 | val_acc=0.5857 | val_f1m=0.5856 | LR=0.000250
  Época   3 | train_loss=0.1253 | train_acc=0.5426 | val_loss=0.1135 | val_acc=0.6175 | val_f1m=0.6028 | LR=0.000375
  Época   4 | train_loss=0.1017 | train_acc=0.7018 | val_loss=0.0938 | val_acc=0.7540 | val_f1m=0.7522 | LR=0.000500
  Época   5 | train_loss=0.0890 | train_acc=0.7715 | val_loss=0.0879 | val_acc=0.7635 | val_f1m=0.7633 | LR=0.000500
  Época   6 | train_loss=0.0821 | train_acc=0.7928 | val_loss=0.0847 | val_acc=0.7841 | val_f1m=0.7839 | LR=0.000500
  Época   7 | train_loss=0.0757 | train_acc=0.8141 | val_loss=0.0844 | val_acc=0.7746 | val_f1m=0.7746 | LR=0.000499
  Época   8 | train_loss=0.0752 | train_acc=0.8120 | val_loss=0.0873 | val_acc=0.7698 | val_f1m=0.7690



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold3/tsne_cls_fold3_nb4_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 3 completado: ACC=0.7993, F1=0.7991

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:03<00:00, 17.11it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 17.29it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.23it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1310 | train_acc=0.4951 | val_loss=0.1232 | val_acc=0.5222 | val_f1m=0.4268 | LR=0.000125
  Época   2 | train_loss=0.1249 | train_acc=0.5483 | val_loss=0.1204 | val_acc=0.6079 | val_f1m=0.5976 | LR=0.000250
  Época   3 | train_loss=0.1190 | train_acc=0.5970 | val_loss=0.1185 | val_acc=0.6222 | val_f1m=0.6103 | LR=0.000375
  Época   4 | train_loss=0.1036 | train_acc=0.6964 | val_loss=0.0993 | val_acc=0.7095 | val_f1m=0.7091 | LR=0.000500
  Época   5 | train_loss=0.0935 | train_acc=0.7395 | val_loss=0.1181 | val_acc=0.6921 | val_f1m=0.6784 | LR=0.000500
  Época   6 | train_loss=0.0836 | train_acc=0.7791 | val_loss=0.0922 | val_acc=0.7397 | val_f1m=0.7397 | LR=0.000500
  Época   7 | train_loss=0.0785 | train_acc=0.8022 | val_loss=0.0985 | val_acc=0.7667 | val_f1m=0.7667 | LR=0.000499
  Época   8 | train_loss=0.0760 | train_acc=0.8081 | val_loss=0.0881 | val_acc=0.7635 | val_f1m=0.7635



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold4/tsne_cls_fold4_nb4_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.28 ms (B=8)
✓ Fold 4 completado: ACC=0.8238, F1=0.8236

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:03<00:00, 17.32it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.34it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 17.30it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1316 | train_acc=0.5123 | val_loss=0.1213 | val_acc=0.5460 | val_f1m=0.4985 | LR=0.000125
  Época   2 | train_loss=0.1253 | train_acc=0.5235 | val_loss=0.1198 | val_acc=0.5952 | val_f1m=0.5677 | LR=0.000250
  Época   3 | train_loss=0.1221 | train_acc=0.5711 | val_loss=0.1111 | val_acc=0.6746 | val_f1m=0.6664 | LR=0.000375
  Época   4 | train_loss=0.1011 | train_acc=0.7220 | val_loss=0.1036 | val_acc=0.6905 | val_f1m=0.6903 | LR=0.000500
  Época   5 | train_loss=0.0883 | train_acc=0.7731 | val_loss=0.0984 | val_acc=0.7302 | val_f1m=0.7302 | LR=0.000500
  Época   6 | train_loss=0.0794 | train_acc=0.8008 | val_loss=0.1046 | val_acc=0.7238 | val_f1m=0.7183 | LR=0.000500
  Época   7 | train_loss=0.0795 | train_acc=0.8036 | val_loss=0.1033 | val_acc=0.7190 | val_f1m=0.7173 | LR=0.000499
  Época   8 | train_loss=0.0788 | train_acc=0.7987 | val_loss=0.0989 | val_acc=0.7587 | val_f1m=0.7586



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h4/nb4_h4/fold5/tsne_cls_fold5_nb4_h4.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 5 completado: ACC=0.8202, F1=0.8202

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb4_h4:
  F1 por fold: ['0.7686', '0.8276', '0.7991', '0.8236', '0.8202']
  F1 Media ± Std: 0.8078 ± 0.0219
  ACC Media ± Std: 0.8079 ± 0.0219
  Parámetros: 352,738
────────────────────────────────────────────────────────────────────────────────

ARQUITECTURA 9/9: nb4_h6
  • Depthwise Blocks: 4
  • Attention Heads: 6
--------------------------------------------------------------------------------

--- Fold 1/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold1


Cargando train fold1: 100%|██████████| 67/67 [00:03<00:00, 17.00it/s]
Cargando val fold1: 100%|██████████| 15/15 [00:00<00:00, 17.32it/s]
Cargando test fold1: 100%|██████████| 21/21 [00:01<00:00, 17.43it/s]


[Fold 1/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1303 | train_acc=0.5114 | val_loss=0.1230 | val_acc=0.5175 | val_f1m=0.4273 | LR=0.000125
  Época   2 | train_loss=0.1240 | train_acc=0.5430 | val_loss=0.1201 | val_acc=0.5540 | val_f1m=0.5056 | LR=0.000250
  Época   3 | train_loss=0.1176 | train_acc=0.6091 | val_loss=0.1275 | val_acc=0.6016 | val_f1m=0.5691 | LR=0.000375
  Época   4 | train_loss=0.0972 | train_acc=0.7335 | val_loss=0.1091 | val_acc=0.7175 | val_f1m=0.7141 | LR=0.000500
  Época   5 | train_loss=0.0833 | train_acc=0.7854 | val_loss=0.1022 | val_acc=0.7048 | val_f1m=0.7043 | LR=0.000500
  Época   6 | train_loss=0.0873 | train_acc=0.7672 | val_loss=0.0997 | val_acc=0.6937 | val_f1m=0.6823 | LR=0.000500
  Época   7 | train_loss=0.0725 | train_acc=0.8213 | val_loss=0.1084 | val_acc=0.7571 | val_f1m=0.7547 | LR=0.000499
  Época   8 | train_loss=0.0660 | train_acc=0.8397 | val_loss=0.0946 | val_acc=0.7778 | val_f1m=0.7776



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold1/tsne_cls_fold1_nb4_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.28 ms (B=8)
✓ Fold 1 completado: ACC=0.7789, F1=0.7784

--- Fold 2/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold2


Cargando train fold2: 100%|██████████| 67/67 [00:03<00:00, 16.89it/s]
Cargando val fold2: 100%|██████████| 15/15 [00:00<00:00, 16.58it/s]
Cargando test fold2: 100%|██████████| 21/21 [00:01<00:00, 16.81it/s]


[Fold 2/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1309 | train_acc=0.5085 | val_loss=0.1226 | val_acc=0.5667 | val_f1m=0.4952 | LR=0.000125
  Época   2 | train_loss=0.1232 | train_acc=0.5537 | val_loss=0.1184 | val_acc=0.6048 | val_f1m=0.6042 | LR=0.000250
  Época   3 | train_loss=0.1157 | train_acc=0.6230 | val_loss=0.1037 | val_acc=0.7222 | val_f1m=0.7207 | LR=0.000375
  Época   4 | train_loss=0.0977 | train_acc=0.7260 | val_loss=0.1142 | val_acc=0.7270 | val_f1m=0.7262 | LR=0.000500
  Época   5 | train_loss=0.0898 | train_acc=0.7679 | val_loss=0.1095 | val_acc=0.7508 | val_f1m=0.7505 | LR=0.000500
  Época   6 | train_loss=0.0781 | train_acc=0.7946 | val_loss=0.0967 | val_acc=0.7476 | val_f1m=0.7457 | LR=0.000500
  Época   7 | train_loss=0.0802 | train_acc=0.7875 | val_loss=0.0924 | val_acc=0.7492 | val_f1m=0.7488 | LR=0.000499
  Época   8 | train_loss=0.0707 | train_acc=0.8266 | val_loss=0.0939 | val_acc=0.7651 | val_f1m=0.7649



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold2/tsne_cls_fold2_nb4_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.28 ms (B=8)
✓ Fold 2 completado: ACC=0.8175, F1=0.8175

--- Fold 3/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold3


Cargando train fold3: 100%|██████████| 67/67 [00:03<00:00, 16.88it/s]
Cargando val fold3: 100%|██████████| 15/15 [00:00<00:00, 17.46it/s]
Cargando test fold3: 100%|██████████| 21/21 [00:01<00:00, 17.41it/s]


[Fold 3/5] Entrenando modelo global... (n_train=2814 | n_val=630 | n_test=882)
  Época   1 | train_loss=0.1314 | train_acc=0.5071 | val_loss=0.1205 | val_acc=0.5524 | val_f1m=0.4863 | LR=0.000125
  Época   2 | train_loss=0.1238 | train_acc=0.5522 | val_loss=0.1166 | val_acc=0.6190 | val_f1m=0.6190 | LR=0.000250
  Época   3 | train_loss=0.1255 | train_acc=0.5398 | val_loss=0.1152 | val_acc=0.6063 | val_f1m=0.5859 | LR=0.000375
  Época   4 | train_loss=0.1105 | train_acc=0.6585 | val_loss=0.0941 | val_acc=0.7460 | val_f1m=0.7429 | LR=0.000500
  Época   5 | train_loss=0.0884 | train_acc=0.7704 | val_loss=0.0828 | val_acc=0.7778 | val_f1m=0.7778 | LR=0.000500
  Época   6 | train_loss=0.0813 | train_acc=0.7967 | val_loss=0.1069 | val_acc=0.7317 | val_f1m=0.7204 | LR=0.000500
  Época   7 | train_loss=0.0737 | train_acc=0.8149 | val_loss=0.0928 | val_acc=0.7508 | val_f1m=0.7425 | LR=0.000499
  Época   8 | train_loss=0.0763 | train_acc=0.8109 | val_loss=0.0824 | val_acc=0.7937 | val_f1m=0.7936



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold3/tsne_cls_fold3_nb4_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.27 ms (B=8)
✓ Fold 3 completado: ACC=0.7846, F1=0.7845

--- Fold 4/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold4


Cargando train fold4: 100%|██████████| 68/68 [00:04<00:00, 16.98it/s]
Cargando val fold4: 100%|██████████| 15/15 [00:00<00:00, 16.90it/s]
Cargando test fold4: 100%|██████████| 20/20 [00:01<00:00, 17.13it/s]


[Fold 4/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1312 | train_acc=0.4909 | val_loss=0.1225 | val_acc=0.5317 | val_f1m=0.4590 | LR=0.000125
  Época   2 | train_loss=0.1246 | train_acc=0.5455 | val_loss=0.1201 | val_acc=0.6016 | val_f1m=0.5915 | LR=0.000250
  Época   3 | train_loss=0.1181 | train_acc=0.6078 | val_loss=0.1147 | val_acc=0.6603 | val_f1m=0.6508 | LR=0.000375
  Época   4 | train_loss=0.1015 | train_acc=0.7101 | val_loss=0.1010 | val_acc=0.6968 | val_f1m=0.6968 | LR=0.000500
  Época   5 | train_loss=0.0906 | train_acc=0.7532 | val_loss=0.1073 | val_acc=0.7063 | val_f1m=0.6986 | LR=0.000500
  Época   6 | train_loss=0.0860 | train_acc=0.7717 | val_loss=0.0953 | val_acc=0.7365 | val_f1m=0.7364 | LR=0.000500
  Época   7 | train_loss=0.0754 | train_acc=0.8130 | val_loss=0.1069 | val_acc=0.7381 | val_f1m=0.7364 | LR=0.000499
  Época   8 | train_loss=0.0741 | train_acc=0.8099 | val_loss=0.0888 | val_acc=0.7556 | val_f1m=0.7551



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold4/tsne_cls_fold4_nb4_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.29 ms (B=8)
✓ Fold 4 completado: ACC=0.8274, F1=0.8274

--- Fold 5/5 ---
[IO] Guardando artefactos en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold5


Cargando train fold5: 100%|██████████| 68/68 [00:04<00:00, 16.82it/s]
Cargando val fold5: 100%|██████████| 15/15 [00:00<00:00, 17.25it/s]
Cargando test fold5: 100%|██████████| 20/20 [00:01<00:00, 16.98it/s]


[Fold 5/5] Entrenando modelo global... (n_train=2856 | n_val=630 | n_test=840)
  Época   1 | train_loss=0.1319 | train_acc=0.5154 | val_loss=0.1214 | val_acc=0.5476 | val_f1m=0.4917 | LR=0.000125
  Época   2 | train_loss=0.1253 | train_acc=0.5368 | val_loss=0.1188 | val_acc=0.6127 | val_f1m=0.5920 | LR=0.000250
  Época   3 | train_loss=0.1217 | train_acc=0.5749 | val_loss=0.1109 | val_acc=0.6651 | val_f1m=0.6559 | LR=0.000375
  Época   4 | train_loss=0.1013 | train_acc=0.7171 | val_loss=0.1091 | val_acc=0.6778 | val_f1m=0.6775 | LR=0.000500
  Época   5 | train_loss=0.0893 | train_acc=0.7630 | val_loss=0.0960 | val_acc=0.7159 | val_f1m=0.7130 | LR=0.000500
  Época   6 | train_loss=0.0797 | train_acc=0.7959 | val_loss=0.1083 | val_acc=0.7317 | val_f1m=0.7277 | LR=0.000500
  Época   7 | train_loss=0.0826 | train_acc=0.7868 | val_loss=0.0938 | val_acc=0.7476 | val_f1m=0.7466 | LR=0.000499
  Época   8 | train_loss=0.0706 | train_acc=0.8267 | val_loss=0.1090 | val_acc=0.7413 | val_f1m=0.7398



↳ t-SNE (CLS embeddings) guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/nb4_h6/nb4_h6/fold5/tsne_cls_fold5_nb4_h6.png
  [LITE] FT desactivado (do_ft=False).
[Consumo] params_total=352,738 | params_trainable=352,738 | FLOPs~5.4M | latency/batch=1.29 ms (B=8)
✓ Fold 5 completado: ACC=0.8369, F1=0.8369

────────────────────────────────────────────────────────────────────────────────
RESUMEN nb4_h6:
  F1 por fold: ['0.7784', '0.8175', '0.7845', '0.8274', '0.8369']
  F1 Media ± Std: 0.8089 ± 0.0233
  ACC Media ± Std: 0.8090 ± 0.0232
  Parámetros: 352,738
────────────────────────────────────────────────────────────────────────────────

SWEEP ARQUITECTÓNICO COMPLETADO

Resultados guardados en: /root/Proyecto/EEG_Clasificador/models/04_hybrid/arch_sweep_5fold/arch_sweep_5fold_results.csv

Mejor arquitectura:
  • Arch ID: nb0_h4
  • Blocks: 0, Heads: 4
  • F1 Score: 0.8223 ± 0.0186
  • Accuracy: 0.8223 ± 0.0185
  • Parámetros: 205,890


PASO 2: Análisis estadístico 

TypeError: Object of type bool is not JSON serializable

---
## Documentación del Sweep Arquitectónico

### 📋 Características del Análisis:

#### Variaciones Arquitectónicas:
- **Bloques Depthwise**: Probará 0, 2, 4 bloques (base=2, deltas=(-2,0,+2))
- **Attention Heads**: Probará 2, 4, 6 cabezas
- **Total de configuraciones**: Hasta 9 arquitecturas diferentes

#### Metodología:
1. **5-Fold Cross-Validation** por cada arquitectura
2. **Test de Wilcoxon Pareado**: Compara cada arquitectura vs baseline
3. **Test de Friedman**: Comparación múltiple de todas las arquitecturas
4. **Visualizaciones**: Gráficos, heatmaps y tablas comprehensivas

### ⏱️ Tiempo Estimado:
- **Por fold**: ~5-10 minutos
- **Por arquitectura**: ~25-50 minutos (5 folds)
- **Total**: ~4-8 horas (9 arquitecturas × 5 folds)

### 📊 Resultados Generados:

#### 1. **Datos (CSV/JSON)**:
- `arch_sweep_5fold_results.csv`: Resultados completos con F1 y Accuracy por fold
- `wilcoxon_results.csv`: Tests estadísticos de cada arquitectura vs baseline
- `statistical_summary.json`: Resumen con test de Friedman y estadísticas globales

#### 2. **Visualizaciones (PNG)**:
- `arch_sweep_comparison.png`: Gráfico de barras con errorbars y significancia
- `arch_sweep_heatmaps.png`: 2 heatmaps (F1-score y parámetros por configuración)
- `arch_sweep_pvalues.png`: Gráfico de -log₁₀(p-value) para significancia visual
- `arch_sweep_table.png`: Tabla formateada con todos los resultados

#### 3. **Modelos Guardados**:
- Cada fold de cada arquitectura se guarda en: `arch_sweep_5fold/<arch_id>/<arch_id>/fold{X}/`

### 🧪 Tests Estadísticos:

#### Test de Wilcoxon (pareado):
- **Hipótesis nula**: No hay diferencia entre la arquitectura y el baseline
- **Alternativa**: Hay diferencia significativa (two-sided)
- **Nivel de significancia**: α = 0.05
- **Ventaja**: No asume normalidad, apropiado para n=5 folds

#### Test de Friedman:
- **Hipótesis nula**: No hay diferencias entre ninguna de las arquitecturas
- **Alternativa**: Al menos una arquitectura es diferente
- **Ventaja**: Alternativa no paramétrica a ANOVA para medidas repetidas

### 🔬 Para tu Tesis:

#### Preguntas de investigación que puedes responder:
1. **¿El número de bloques depthwise afecta el rendimiento?**
   - Analiza el heatmap de F1 vs bloques
   
2. **¿Más attention heads siempre es mejor?**
   - Compara columnas del heatmap (heads fijos, variar bloques)
   
3. **¿Existe un trade-off entre complejidad y rendimiento?**
   - Compara heatmap de F1 vs heatmap de parámetros
   
4. **¿Qué arquitectura es estadísticamente superior?**
   - Usa los resultados del test de Wilcoxon
   
5. **¿Las diferencias son significativas en general?**
   - Usa el resultado del test de Friedman

#### Cómo reportar en la tesis:
```
Se evaluaron 9 configuraciones arquitectónicas diferentes utilizando 5-fold 
subject-independent cross-validation. El test de Friedman reveló diferencias 
significativas entre las arquitecturas (χ² = X.XX, p < 0.05). 

El análisis post-hoc mediante test de Wilcoxon pareado identificó que la 
arquitectura nb2_h6 (2 bloques depthwise, 6 attention heads) obtuvo el mejor 
rendimiento con F1-score = 0.XXXX ± 0.XXXX, siendo significativamente superior 
al baseline (p = 0.XXX).
```

### 📝 Uso Básico:

```python
# 1. Ejecutar sweep completo
df_results = run_arch_sweep_5folds(
    device=DEVICE,
    base_blocks=2,
    deltas=(-2, 0, +2),
    head_set=(2, 4, 6)
)

# 2. Análisis estadístico
stats = analyze_arch_sweep_statistical(
    df_results=df_results,
    baseline_arch='nb2_h6',
    alpha=0.05
)

# 3. Visualizar resultados
plot_arch_sweep_results(
    df_results=df_results,
    df_wilcoxon=stats['wilcoxon_results']
)

# 4. Ejecutar ganador con todos los artefactos
winner = df_results.iloc[0]['arch_id']
winner_results = run_winner_architecture(
    device=DEVICE,
    arch_id=winner
)
```

### ⚠️ Notas Importantes:

1. **Reproducibilidad**: La semilla se fija por fold (RANDOM_STATE + fold_idx)
2. **Comparabilidad**: Todas las arquitecturas usan los mismos 5 folds
3. **Artefactos**: Se guardan modelos, métricas, y visualizaciones por fold
4. **Memoria**: El proceso limpia GPU después de cada fold
5. **Errores**: Maneja excepciones y continúa con siguiente arquitectura

### 🎯 Arquitectura Baseline Recomendada:

Por defecto se usa **nb2_h6** como baseline (2 bloques, 6 heads), pero puedes cambiarlo:
- Si ya tienes un modelo entrenado, usa ese como baseline
- Si no, el sweep identificará automáticamente el mejor

### 📈 Interpretación de Resultados:

- **p < 0.01**: Evidencia muy fuerte de diferencia
- **p < 0.05**: Evidencia moderada de diferencia
- **p ≥ 0.05**: No hay evidencia de diferencia significativa

Los gráficos están codificados por colores:
- 🔴 **Rojo/Salmon**: Diferencias significativas
- 🔵 **Azul**: No significativas
- 🟢 **Verde**: Mejor arquitectura

---
## BÚSQUEDA ESTADÍSTICA DE HIPERPARÁMETROS (Grid Search)

Esta sección implementa una búsqueda exhaustiva robusta sobre el modelo ganador **nb2_h6** utilizando 5-fold Cross-Validation para cada combinación de hiperparámetros.

### Objetivo:
Evaluar sistemáticamente variaciones estadísticas del modelo base para encontrar la configuración óptima y preparar datos para análisis estadístico (test de Wilcoxon).

In [29]:
# =========================
# GRID SEARCH ESTADÍSTICO
# =========================

def run_statistical_grid_search(device, folds_json=None, output_dir=None, metric='f1'):
    """
    Realiza búsqueda exhaustiva (Grid Search) robusta sobre el modelo ganador nb2_h6.
    """
    import pandas as pd
    from itertools import product
    
    # Configuración
    if folds_json is None:
        folds_json = FOLDS_JSON
    
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'grid_search'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    
    # ========================================
    # PARÁMETROS FIJOS (BASELINE nb2_h6)
    # ========================================
    FIXED_PARAMS = {
        'model_name': 'nb2_h6',
        'd_model': 144,
        'n_blocks': 2,
        'n_heads': 6,
        'warmup_epochs': 4,
        'patience': 12,
        'batch_size': BATCH_SIZE,
        'epochs': EPOCHS,
    }
    
    # ========================================
    # ESPACIO DE BÚSQUEDA (GRID)
    # ========================================
    GRID_PARAMS = {
        'learning_rates': [5e-4, 1e-3],
        'dropouts': [0.1, 0.2],
        'encoder_layers': [1, 2]
    }
    
    # Generar todas las combinaciones
    param_combinations = list(product(
        GRID_PARAMS['learning_rates'],
        GRID_PARAMS['dropouts'],
        GRID_PARAMS['encoder_layers']
    ))
    
    total_configs = len(param_combinations)
    total_runs = total_configs * 5  # 5 folds por combinación
    
    print("="*80)
    print("GRID SEARCH ESTADÍSTICO - Modelo Ganador nb2_h6")
    print("="*80)
    print(f"\nParámetros Fijos:")
    for k, v in FIXED_PARAMS.items():
        print(f"  • {k}: {v}")
    
    print(f"\nEspacio de Búsqueda:")
    for k, v in GRID_PARAMS.items():
        print(f"  • {k}: {v}")
    
    print(f"\nTotal de configuraciones: {total_configs}")
    print(f"Total de entrenamientos: {total_runs} ({total_configs} configs × 5 folds)")
    print(f"Métrica objetivo: {metric.upper()}")
    print(f"Directorio de salida: {output_dir}")
    print("="*80 + "\n")
    
    # Almacenar resultados
    results = []
    
    # ========================================
    # GRID SEARCH LOOP
    # ========================================
    for config_idx, (lr, dropout, n_layers) in enumerate(param_combinations, 1):
        print(f"\n{'='*80}")
        print(f"CONFIGURACIÓN {config_idx}/{total_configs}")
        print(f"{'='*80}")
        print(f"Learning Rate: {lr}")
        print(f"Dropout: {dropout} (CNN + Encoder)")
        print(f"Encoder Layers: {n_layers}")
        print("-"*80)
        
        # Crear configuración completa
        config = {
            'config_id': f'config_{config_idx:02d}',
            'learning_rate': lr,
            'dropout': dropout,
            'encoder_layers': n_layers,
            **FIXED_PARAMS
        }
        
        # Resultados de los 5 folds
        fold_scores = []
        n_params = None
        
        # ========================================
        # 5-FOLD CROSS-VALIDATION
        # ========================================
        for fold_idx in range(1, 6):
            print(f"\n--- Fold {fold_idx}/5 ---")
            seed_everything(RANDOM_STATE + fold_idx)
            
            try:
                # Cargar datos del fold
                train_subs, test_subs = load_fold_subjects(folds_json, fold_idx)
                print(f"Train: {len(train_subs)} sujetos | Test: {len(test_subs)} sujetos")
                
                # Cargar y preparar datos
                X_train_list, y_train_list = [], []
                for sid in tqdm(train_subs, desc="Cargando train", leave=False):
                    X, y, _ = load_subject_epochs(
                        sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS,
                        DO_CAR, BP_LO, BP_HI
                    )
                    if X.shape[0] > 0:
                        X_train_list.append(X)
                        y_train_list.append(y)
                
                X_test_list, y_test_list = [], []
                for sid in tqdm(test_subs, desc="Cargando test", leave=False):
                    X, y, _ = load_subject_epochs(
                        sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS,
                        DO_CAR, BP_LO, BP_HI
                    )
                    if X.shape[0] > 0:
                        X_test_list.append(X)
                        y_test_list.append(y)
                
                X_train = np.concatenate(X_train_list, axis=0)
                y_train = np.concatenate(y_train_list, axis=0)
                X_test = np.concatenate(X_test_list, axis=0)
                y_test = np.concatenate(y_test_list, axis=0)
                
                # Estandarizar
                X_train, X_test = standardize_per_channel(X_train, X_test)
                
                print(f"Train: {X_train.shape} | Test: {X_test.shape}")
                
                # Crear modelo con la configuración actual
                model = EEGCNNTransformer(
                    n_ch=8,
                    n_cls=2,
                    d_model=config['d_model'],
                    n_heads=config['n_heads'],
                    n_layers=n_layers,
                    p_drop=dropout,
                    p_drop_encoder=dropout,
                    n_dw_blocks=config['n_blocks'],
                    capture_attn=False
                ).to(device)
                
                if n_params is None:
                    n_params = count_params(model)
                    print(f"Parámetros del modelo: {n_params:,}")
                
                # Preparar DataLoader
                train_dataset = TensorDataset(
                    torch.tensor(X_train, dtype=torch.float32),
                    torch.tensor(y_train, dtype=torch.long)
                )
                
                if USE_WEIGHTED_SAMPLER:
                    class_counts = np.bincount(y_train)
                    class_weights = 1.0 / (class_counts + 1e-6)
                    sample_weights = class_weights[y_train]
                    sampler = WeightedRandomSampler(
                        weights=sample_weights,
                        num_samples=len(sample_weights),
                        replacement=True
                    )
                    train_loader = DataLoader(
                        train_dataset, batch_size=config['batch_size'],
                        sampler=sampler, worker_init_fn=seed_worker,
                        generator=torch.Generator().manual_seed(RANDOM_STATE)
                    )
                else:
                    train_loader = DataLoader(
                        train_dataset, batch_size=config['batch_size'], shuffle=True,
                        worker_init_fn=seed_worker,
                        generator=torch.Generator().manual_seed(RANDOM_STATE)
                    )
                
                # Pérdida y optimizador
                alpha = torch.tensor([1.0, 1.0], device=device)
                criterion = FocalLoss(alpha=alpha, gamma=1.5)
                optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4)
                scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
                    optimizer, T_max=config['epochs'], eta_min=1e-6
                )
                
                # EMA
                ema = ModelEMA(model, decay=EMA_DECAY, device=device) if USE_EMA else None
                
                # ========================================
                # TRAINING LOOP
                # ========================================
                best_val_score = 0.0
                patience_counter = 0
                
                for epoch in range(config['epochs']):
                    model.train()
                    train_loss = 0.0
                    
                    for xb, yb in train_loader:
                        xb, yb = xb.to(device), yb.to(device)
                        
                        # Augmentación después de warmup
                        if epoch >= config['warmup_epochs']:
                            xb = augment_batch(xb)
                        
                        optimizer.zero_grad()
                        logits = model(xb)
                        loss = criterion(logits, yb)
                        loss.backward()
                        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
                        optimizer.step()
                        
                        if ema is not None:
                            ema.update(model)
                        
                        train_loss += loss.item()
                    
                    scheduler.step()
                    
                    # Validación
                    model.eval()
                    with torch.no_grad():
                        X_test_t = torch.tensor(X_test, dtype=torch.float32, device=device)
                        logits = model(X_test_t)
                        y_pred = logits.argmax(dim=1).cpu().numpy()
                        
                        if metric == 'f1':
                            val_score = f1_score(y_test, y_pred, average='macro')
                        else:
                            val_score = accuracy_score(y_test, y_pred)
                    
                    # Logging cada 10 epochs
                    if (epoch + 1) % 10 == 0:
                        print(f"Epoch {epoch+1}/{config['epochs']} | "
                              f"Loss: {train_loss/len(train_loader):.4f} | "
                              f"Val {metric}: {val_score:.4f}")
                    
                    # Early stopping
                    if val_score > best_val_score:
                        best_val_score = val_score
                        patience_counter = 0
                        # Guardar mejor modelo
                        model_path = output_dir / f"{config['config_id']}_fold{fold_idx}.pt"
                        torch.save(model.state_dict(), model_path)
                    else:
                        patience_counter += 1
                        if patience_counter >= config['patience']:
                            print(f"Early stopping en epoch {epoch+1}")
                            break
                
                # Cargar mejor modelo y evaluar
                model.load_state_dict(torch.load(output_dir / f"{config['config_id']}_fold{fold_idx}.pt"))
                model.eval()
                
                with torch.no_grad():
                    X_test_t = torch.tensor(X_test, dtype=torch.float32, device=device)
                    logits = model(X_test_t)
                    y_pred = logits.argmax(dim=1).cpu().numpy()
                
                # Calcular métricas finales
                if metric == 'f1':
                    final_score = f1_score(y_test, y_pred, average='macro')
                else:
                    final_score = accuracy_score(y_test, y_pred)
                
                fold_scores.append(final_score)
                
                print(f"✓ Fold {fold_idx} completado: {metric} = {final_score:.4f}")
                
                # Limpiar memoria
                del model, optimizer, scheduler, train_loader, X_train, y_train
                torch.cuda.empty_cache()
                
            except Exception as e:
                print(f"✗ ERROR en Fold {fold_idx}: {e}")
                import traceback
                traceback.print_exc()
                fold_scores.append(0.0)
        
        # ========================================
        # COMPILAR RESULTADOS DE ESTA CONFIG
        # ========================================
        result_row = {
            'config_id': config['config_id'],
            'learning_rate': lr,
            'dropout': dropout,
            'encoder_layers': n_layers,
            'fold_1': fold_scores[0] if len(fold_scores) > 0 else 0.0,
            'fold_2': fold_scores[1] if len(fold_scores) > 1 else 0.0,
            'fold_3': fold_scores[2] if len(fold_scores) > 2 else 0.0,
            'fold_4': fold_scores[3] if len(fold_scores) > 3 else 0.0,
            'fold_5': fold_scores[4] if len(fold_scores) > 4 else 0.0,
            'mean_score': np.mean(fold_scores) if fold_scores else 0.0,
            'std_score': np.std(fold_scores) if fold_scores else 0.0,
            'n_params': n_params if n_params else 0
        }
        
        results.append(result_row)
        
        print(f"\n{'─'*80}")
        print(f"RESUMEN CONFIG {config_idx}:")
        print(f"  Scores por fold: {[f'{s:.4f}' for s in fold_scores]}")
        print(f"  Media ± Std: {result_row['mean_score']:.4f} ± {result_row['std_score']:.4f}")
        print(f"{'─'*80}")
    
    # ========================================
    # COMPILAR RESULTADOS FINALES
    # ========================================
    df_results = pd.DataFrame(results)
    
    # Ordenar por mean_score descendente
    df_results = df_results.sort_values('mean_score', ascending=False).reset_index(drop=True)
    
    # Guardar resultados
    csv_path = output_dir / 'grid_search_results.csv'
    df_results.to_csv(csv_path, index=False)
    
    print("\n" + "="*80)
    print("GRID SEARCH COMPLETADO")
    print("="*80)
    print(f"\nResultados guardados en: {csv_path}")
    print(f"\nMejor configuración:")
    best = df_results.iloc[0]
    print(f"  • Config ID: {best['config_id']}")
    print(f"  • Learning Rate: {best['learning_rate']}")
    print(f"  • Dropout: {best['dropout']}")
    print(f"  • Encoder Layers: {int(best['encoder_layers'])}")
    print(f"  • {metric.capitalize()} Score: {best['mean_score']:.4f} ± {best['std_score']:.4f}")
    print("="*80 + "\n")
    
    return df_results


def plot_grid_search_results(df_results, output_dir=None, metric='f1'):
    """
    Genera visualizaciones de los resultados del grid search.
    """
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'grid_search'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    
    # Crear etiquetas descriptivas
    df_results['config_label'] = df_results.apply(
        lambda row: f"LR={row['learning_rate']:.0e}\nDO={row['dropout']}\nL={int(row['encoder_layers'])}",
        axis=1
    )
    
    # ========================================
    # GRÁFICO 1: Barras con errorbar
    # ========================================
    fig, ax = plt.subplots(figsize=(14, 6))
    
    x = np.arange(len(df_results))
    ax.bar(x, df_results['mean_score'], yerr=df_results['std_score'],
           capsize=5, color='steelblue', alpha=0.8, edgecolor='navy')
    
    ax.set_xlabel('Configuración', fontsize=12)
    ax.set_ylabel(f'{metric.upper()} Score', fontsize=12)
    ax.set_title(f'Comparación de Configuraciones - Grid Search (nb2_h6)', fontsize=14, fontweight='bold')
    ax.set_xticks(x)
    ax.set_xticklabels(df_results['config_label'], rotation=45, ha='right', fontsize=9)
    ax.axhline(y=df_results['mean_score'].iloc[0], color='red', linestyle='--', alpha=0.5, label='Mejor')
    ax.legend()
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    fig_path = output_dir / 'grid_search_comparison.png'
    plt.savefig(fig_path, dpi=150, bbox_inches='tight')
    print(f"✓ Gráfico guardado: {fig_path}")
    plt.close()
    
    # ========================================
    # GRÁFICO 2: Heatmap de interacciones
    # ========================================
    fig, axes = plt.subplots(1, 3, figsize=(16, 5))
    
    # LR vs Dropout (promediando layers)
    pivot1 = df_results.pivot_table(
        values='mean_score',
        index='dropout',
        columns='learning_rate',
        aggfunc='mean'
    )
    im1 = axes[0].imshow(pivot1, cmap='YlOrRd', aspect='auto')
    axes[0].set_title('LR vs Dropout')
    axes[0].set_xlabel('Learning Rate')
    axes[0].set_ylabel('Dropout')
    axes[0].set_xticks(range(len(pivot1.columns)))
    axes[0].set_xticklabels([f'{v:.0e}' for v in pivot1.columns], rotation=45)
    axes[0].set_yticks(range(len(pivot1.index)))
    axes[0].set_yticklabels(pivot1.index)
    for i in range(len(pivot1.index)):
        for j in range(len(pivot1.columns)):
            axes[0].text(j, i, f'{pivot1.iloc[i, j]:.3f}',
                         ha='center', va='center', color='black', fontsize=10)
    plt.colorbar(im1, ax=axes[0])
    
    # LR vs Layers (promediando dropout)
    pivot2 = df_results.pivot_table(
        values='mean_score',
        index='encoder_layers',
        columns='learning_rate',
        aggfunc='mean'
    )
    im2 = axes[1].imshow(pivot2, cmap='YlOrRd', aspect='auto')
    axes[1].set_title('LR vs Encoder Layers')
    axes[1].set_xlabel('Learning Rate')
    axes[1].set_ylabel('Encoder Layers')
    axes[1].set_xticks(range(len(pivot2.columns)))
    axes[1].set_xticklabels([f'{v:.0e}' for v in pivot2.columns], rotation=45)
    axes[1].set_yticks(range(len(pivot2.index)))
    axes[1].set_yticklabels([int(v) for v in pivot2.index])
    for i in range(len(pivot2.index)):
        for j in range(len(pivot2.columns)):
            axes[1].text(j, i, f'{pivot2.iloc[i, j]:.3f}',
                         ha='center', va='center', color='black', fontsize=10)
    plt.colorbar(im2, ax=axes[1])
    
    # Dropout vs Layers (promediando LR)
    pivot3 = df_results.pivot_table(
        values='mean_score',
        index='encoder_layers',
        columns='dropout',
        aggfunc='mean'
    )
    im3 = axes[2].imshow(pivot3, cmap='YlOrRd', aspect='auto')
    axes[2].set_title('Dropout vs Encoder Layers')
    axes[2].set_xlabel('Dropout')
    axes[2].set_ylabel('Encoder Layers')
    axes[2].set_xticks(range(len(pivot3.columns)))
    axes[2].set_xticklabels(pivot3.columns)
    axes[2].set_yticks(range(len(pivot3.index)))
    axes[2].set_yticklabels([int(v) for v in pivot3.index])
    for i in range(len(pivot3.index)):
        for j in range(len(pivot3.columns)):
            axes[2].text(j, i, f'{pivot3.iloc[i, j]:.3f}',
                         ha='center', va='center', color='black', fontsize=10)
    plt.colorbar(im3, ax=axes[2])
    
    plt.suptitle('Heatmaps de Interacciones entre Hiperparámetros', fontsize=14, fontweight='bold')
    plt.tight_layout()
    heatmap_path = output_dir / 'grid_search_heatmaps.png'
    plt.savefig(heatmap_path, dpi=150, bbox_inches='tight')
    print(f"✓ Heatmaps guardados: {heatmap_path}")
    plt.close()
    
    # ========================================
    # TABLA RESUMEN
    # ========================================
    fig, ax = plt.subplots(figsize=(14, 4))
    ax.axis('tight')
    ax.axis('off')
    
    table_data = []
    for idx, row in df_results.iterrows():
        table_data.append([
            row['config_id'],
            f"{row['learning_rate']:.0e}",
            f"{row['dropout']:.1f}",
            f"{int(row['encoder_layers'])}",
            f"{row['mean_score']:.4f} ± {row['std_score']:.4f}",
            f"{row['n_params']:,}"
        ])
    
    table = ax.table(cellText=table_data,
                     colLabels=['Config ID', 'LR', 'Dropout', 'Layers', f'{metric.upper()} Score', 'Params'],
                     cellLoc='center',
                     loc='center')
    table.auto_set_font_size(False)
    table.set_fontsize(9)
    table.scale(1, 2)
    
    # Colorear la mejor fila
    for i in range(6):
        table[(1, i)].set_facecolor('#90EE90')
    
    plt.title('Tabla Resumen - Grid Search nb2_h6', fontsize=12, fontweight='bold', pad=20)
    
    table_path = output_dir / 'grid_search_table.png'
    plt.savefig(table_path, dpi=150, bbox_inches='tight')
    print(f"✓ Tabla guardada: {table_path}")
    plt.close()
    
    return df_results

print("✓ Funciones de Grid Search estadístico definidas")


# =========================
# ENTRENAR MODELO GANADOR CON HIPERPARÁMETROS PERSONALIZADOS
# =========================

def train_winner_model(
    device,
    learning_rate=1e-3,
    dropout=0.2,
    encoder_layers=1,
    warmup_epochs=4,
    patience=12,
    d_model=144,
    n_heads=6,
    n_blocks=2,
    batch_size=None,
    epochs=None,
    folds_json=None,
    output_dir=None,
    model_tag='winner'
):
    """
    Entrena el modelo ganador con hiperparámetros personalizados en los 5 folds.
    """
    import pandas as pd
    from sklearn.metrics import confusion_matrix, classification_report
    import shutil # Import necesario para copiar archivos
    
    # Configuración
    if batch_size is None:
        batch_size = BATCH_SIZE
    if epochs is None:
        epochs = EPOCHS
    if folds_json is None:
        folds_json = FOLDS_JSON
    if output_dir is None:
        output_dir = PROJ / 'models' / '04_hybrid' / 'winner_model'
    else:
        output_dir = Path(output_dir)
    
    ensure_dir(output_dir)
    tag = f"{model_tag}_lr{learning_rate:.0e}_do{dropout}_el{encoder_layers}"
    base_dir = ensure_dir(output_dir / tag)
    
    print("="*80)
    print("ENTRENAR MODELO GANADOR - Configuración Personalizada")
    print("="*80)
    print(f"\nHiperparámetros del Modelo:")
    print(f"  • Learning Rate: {learning_rate}")
    print(f"  • Dropout (CNN + Encoder): {dropout}")
    print(f"  • Encoder Layers: {encoder_layers}")
    print(f"  • Warmup Epochs: {warmup_epochs}")
    print(f"  • Patience: {patience}")
    print(f"  • d_model: {d_model}")
    print(f"  • Attention Heads: {n_heads}")
    print(f"  • Depthwise Blocks: {n_blocks}")
    print(f"  • Batch Size: {batch_size}")
    print(f"  • Max Epochs: {epochs}")
    print(f"\nDirectorio de salida: {base_dir}")
    print(f"Tag del modelo: {tag}")
    print("="*80 + "\n")
    
    # Resultados por fold
    acc_folds, f1_folds = [], []
    fold_results = []
    
    # Acumuladores para visualizaciones globales
    global_cm = np.zeros((2, 2), dtype=int)
    saliency_list = []
    pvals_log_list = []
    
    # Factory del modelo con los hiperparámetros personalizados
    def model_factory():
        return EEGCNNTransformer(
            n_ch=8,
            n_cls=2,
            d_model=d_model,
            n_heads=n_heads,
            n_layers=encoder_layers,
            p_drop=dropout,
            p_drop_encoder=dropout,
            n_dw_blocks=n_blocks,
            capture_attn=True
        ).to(device)
    
    # ========================================
    # ENTRENAR LOS 5 FOLDS
    # ========================================
    for fold_idx in range(1, 6):
        print(f"\n{'='*80}")
        print(f"FOLD {fold_idx}/5")
        print("="*80)
        
        seed_everything(RANDOM_STATE + fold_idx)
        fold_dir = ensure_dir(base_dir / f"fold{fold_idx}")
        
        try:
            # Cargar datos del fold
            train_subs, test_subs = load_fold_subjects(folds_json, fold_idx)
            print(f"Train: {len(train_subs)} sujetos | Test: {len(test_subs)} sujetos")
            
            # Cargar datos
            X_train_list, y_train_list = [], []
            for sid in tqdm(train_subs, desc="Cargando train", leave=False):
                X, y, _ = load_subject_epochs(
                    sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS,
                    DO_CAR, BP_LO, BP_HI
                )
                if X.shape[0] > 0:
                    X_train_list.append(X)
                    y_train_list.append(y)
            
            X_test_list, y_test_list = [], []
            for sid in tqdm(test_subs, desc="Cargando test", leave=False):
                X, y, _ = load_subject_epochs(
                    sid, RESAMPLE_HZ, DO_NOTCH, DO_BANDPASS,
                    DO_CAR, BP_LO, BP_HI
                )
                if X.shape[0] > 0:
                    X_test_list.append(X)
                    y_test_list.append(y)
            
            X_train = np.concatenate(X_train_list, axis=0)
            y_train = np.concatenate(y_train_list, axis=0)
            X_test = np.concatenate(X_test_list, axis=0)
            y_test = np.concatenate(y_test_list, axis=0)
            
            # Estandarizar
            X_train, X_test = standardize_per_channel(X_train, X_test)
            
            print(f"Train: {X_train.shape} | Test: {X_test.shape}")
            
            # Guardar datos de test para visualizaciones posteriores
            np.save(fold_dir / "y_test.npy", y_test)
            
            # Crear modelo
            model = model_factory()
            n_params = count_params(model)
            print(f"Parámetros del modelo: {n_params:,}")
            
            # DataLoader
            train_dataset = TensorDataset(
                torch.tensor(X_train, dtype=torch.float32),
                torch.tensor(y_train, dtype=torch.long)
            )
            
            if USE_WEIGHTED_SAMPLER:
                class_counts = np.bincount(y_train)
                class_weights = 1.0 / (class_counts + 1e-6)
                sample_weights = class_weights[y_train]
                sampler = WeightedRandomSampler(
                    weights=sample_weights,
                    num_samples=len(sample_weights),
                    replacement=True
                )
                train_loader = DataLoader(
                    train_dataset, batch_size=batch_size,
                    sampler=sampler, worker_init_fn=seed_worker,
                    generator=torch.Generator().manual_seed(RANDOM_STATE)
                )
            else:
                train_loader = DataLoader(
                    train_dataset, batch_size=batch_size, shuffle=True,
                    worker_init_fn=seed_worker,
                    generator=torch.Generator().manual_seed(RANDOM_STATE)
                )
            
            # Optimizador y scheduler
            alpha = torch.tensor([1.0, 1.0], device=device)
            criterion = FocalLoss(alpha=alpha, gamma=1.5)
            optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=1e-4)
            
            # Scheduler con warmup
            def lr_lambda(current_epoch):
                if current_epoch < warmup_epochs:
                    return (current_epoch + 1) / warmup_epochs
                progress = (current_epoch - warmup_epochs) / max(1, (epochs - warmup_epochs))
                progress = min(1.0, max(0.0, progress))
                return 0.1 + 0.5 * (1.0 - 0.1) * (1.0 + np.cos(np.pi * progress))
            
            scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lr_lambda)
            
            # EMA
            ema = ModelEMA(model, decay=EMA_DECAY, device=device) if USE_EMA else None
            
            # ========================================
            # TRAINING LOOP
            # ========================================
            best_val_f1 = 0.0
            patience_counter = 0
            history = {"epoch": [], "train_loss": [], "val_loss": [], "val_acc": [], "val_f1": [], "lr": []}
            
            # Medir tiempo y memoria de entrenamiento
            import time
            if torch.cuda.is_available():
                torch.cuda.reset_peak_memory_stats(device)
            t_train_start = time.time()
            
            for epoch in range(epochs):
                model.train()
                train_loss = 0.0
                n_train_batches = 0
                
                for xb, yb in train_loader:
                    xb, yb = xb.to(device), yb.to(device)
                    
                    # Augmentación después de warmup
                    if epoch >= warmup_epochs:
                        xb = augment_batch(xb)
                    
                    optimizer.zero_grad()
                    logits = model(xb)
                    loss = criterion(logits, yb)
                    loss.backward()
                    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
                    optimizer.step()
                    
                    if ema is not None:
                        ema.update(model)
                    
                    train_loss += loss.item()
                    n_train_batches += 1
                
                scheduler.step()
                current_lr = scheduler.get_last_lr()[0]
                
                # Validación
                eval_model = ema.ema if (ema is not None) else model
                eval_model.eval()
                
                val_loss = 0.0
                n_val_batches = 0
                y_pred_list = []
                y_true_list = []
                
                with torch.no_grad():
                    # Evaluar por batches para calcular val_loss
                    for i in range(0, len(X_test), batch_size):
                        X_batch = X_test[i:i+batch_size]
                        y_batch = y_test[i:i+batch_size]
                        
                        X_batch_t = torch.tensor(X_batch, dtype=torch.float32, device=device)
                        y_batch_t = torch.tensor(y_batch, dtype=torch.long, device=device)
                        
                        logits = eval_model(X_batch_t)
                        loss = criterion(logits, y_batch_t)
                        val_loss += loss.item()
                        n_val_batches += 1
                        
                        y_pred_batch = logits.argmax(dim=1).cpu().numpy()
                        y_pred_list.append(y_pred_batch)
                        y_true_list.append(y_batch)
                    
                    y_pred = np.concatenate(y_pred_list)
                    y_true = np.concatenate(y_true_list)
                    
                    val_acc = accuracy_score(y_true, y_pred)
                    val_f1 = f1_score(y_true, y_pred, average='macro')
                
                # Guardar historia
                history["epoch"].append(epoch + 1)
                history["train_loss"].append(train_loss / n_train_batches)
                history["val_loss"].append(val_loss / n_val_batches)
                history["val_acc"].append(val_acc)
                history["val_f1"].append(val_f1)
                history["lr"].append(current_lr)
                
                # Logging cada 10 epochs
                if (epoch + 1) % 10 == 0:
                    print(f"Epoch {epoch+1}/{epochs} | "
                          f"TrLoss: {train_loss/n_train_batches:.4f} | ValLoss: {val_loss/n_val_batches:.4f} | "
                          f"ValAcc: {val_acc:.4f} | ValF1: {val_f1:.4f} | LR: {current_lr:.2e}")
                
                # Early stopping
                if val_f1 > best_val_f1:
                    best_val_f1 = val_f1
                    patience_counter = 0
                    # Guardar mejor modelo
                    model_path = fold_dir / f"best_model_fold{fold_idx}.pt"
                    torch.save(eval_model.state_dict(), model_path)
                else:
                    patience_counter += 1
                    if patience_counter >= patience:
                        print(f"Early stopping en epoch {epoch+1}")
                        break
            
            # Medir tiempo de entrenamiento
            t_train_end = time.time()
            train_time_s = t_train_end - t_train_start
            train_mem_mb = 0.0
            if torch.cuda.is_available():
                train_mem_mb = torch.cuda.max_memory_allocated(device) / (1024.0 ** 2)
            
            # Guardar historia
            history_path = fold_dir / f"history_fold{fold_idx}_{tag}.npz"
            np.savez(
                history_path,
                ep=np.array(history["epoch"]),
                tr_loss=np.array(history["train_loss"]),
                val_loss=np.array(history["val_loss"]),
                val_acc=np.array(history["val_acc"]),
                val_f1=np.array(history["val_f1"]),
                lr=np.array(history["lr"])
            )
            
            # ========================================
            # GENERAR CURVAS DE LOSS POR FOLD
            # ========================================
            try:
                fig, ax = plt.subplots(figsize=(8, 4.5))
                ax.plot(history["epoch"], history["train_loss"], label="Train Loss", linewidth=2)
                ax.plot(history["epoch"], history["val_loss"], label="Val Loss", linewidth=2)
                ax.set_xlabel("Epoch")
                ax.set_ylabel("Loss")
                ax.set_title(f"Loss Curve — Fold {fold_idx} [{tag}]")
                ax.legend()
                ax.grid(alpha=0.3)
                plt.tight_layout()
                fig_path = fold_dir / f"loss_curve_fold{fold_idx}_{tag}.png"
                plt.savefig(fig_path, dpi=150, bbox_inches="tight")
                plt.close()
            except Exception as e:
                print(f"⚠ Error al generar curva de loss fold {fold_idx}: {e}")
            
            # Medir tiempo y memoria de TEST
            if torch.cuda.is_available():
                torch.cuda.reset_peak_memory_stats(device)
            t_test_start = time.time()
            with torch.no_grad():
                X_test_t = torch.tensor(X_test, dtype=torch.float32, device=device)
                logits = model(X_test_t)
                y_pred = logits.argmax(dim=1).cpu().numpy()
            
            # Cerrar cronómetro de TEST
            t_test_end = time.time()
            test_time_s = t_test_end - t_test_start
            test_mem_mb = 0.0
            if torch.cuda.is_available():
                test_mem_mb = torch.cuda.max_memory_allocated(device) / (1024.0 ** 2)
            
            # Métricas finales
            final_acc = accuracy_score(y_test, y_pred)
            final_f1 = f1_score(y_test, y_pred, average='macro')
            
            acc_folds.append(final_acc)
            f1_folds.append(final_f1)
            
            # Matriz de confusión
            cm = confusion_matrix(y_test, y_pred, labels=[0, 1])
            global_cm += cm
            
            # Guardar matriz de confusión del fold
            cm_png = fold_dir / f"confusion_fold{fold_idx}.png"
            plot_confusion_with_text(
                cm=cm,
                class_names=CLASS_NAMES,
                title=f"Confusion Matrix — Fold {fold_idx} [{tag}]",
                out_path=cm_png,
                cmap='Blues'
            )
            
            # Classification report
            report = classification_report(
                y_test, y_pred,
                target_names=[c.replace('_', ' ') for c in CLASS_NAMES],
                digits=4
            )
            print(f"\nClassification Report - Fold {fold_idx}:")
            print(report)
            
            # Guardar report
            with open(fold_dir / f"classification_report_fold{fold_idx}.txt", 'w') as f:
                f.write(report)
            
            # ========================================
            # INTERPRETABILIDAD
            # ========================================
            try:
                # Channel saliency
                xb_ex = X_test[:1]
                xb_ex_t = torch.tensor(xb_ex, dtype=torch.float32, device=device)
                pred_cls = int(model(xb_ex_t).argmax(1).item())
                
                saliency_ch = channel_saliency(model, xb_ex_t, target_class=pred_cls)
                np.save(fold_dir / f"saliency_channels_fold{fold_idx}.npy", saliency_ch)
                saliency_list.append(saliency_ch)
                
                # Topomap de saliency
                info_topo = make_mne_info_8ch(sfreq=160.0)
                topo_sal_png = fold_dir / f"topomap_saliency_fold{fold_idx}.png"
                plot_topomap_from_values(
                    saliency_ch,
                    info_topo,
                    title=f"Channel Saliency — Fold {fold_idx} [{tag}]",
                    out_path=topo_sal_png,
                    cmap="Reds",
                    cbar_label="Normalized saliency"
                )
                
                # -log10(p) left vs right
                pvals = compute_channel_pvalues_lr(X_test, y_test)
                pvals_log = -np.log10(pvals + 1e-12)
                np.save(fold_dir / f"pvals_log_channels_fold{fold_idx}.npy", pvals_log)
                pvals_log_list.append(pvals_log)
                
                # Topomap de -log10(p)
                topo_p_png = fold_dir / f"topomap_log10p_fold{fold_idx}.png"
                plot_topomap_from_values(
                    pvals_log,
                    info_topo,
                    title=f"-log10(p) Left vs Right — Fold {fold_idx} [{tag}]",
                    out_path=topo_p_png,
                    cmap="viridis",
                    cbar_label="-log10(p)"
                )
                
                print(f"✓ Interpretabilidad generada para fold {fold_idx}")
                
            except Exception as e:
                print(f"⚠ Error en interpretabilidad fold {fold_idx}: {e}")
            
            # ========================================
            # MÉTRICAS DE CONSUMO COMPUTACIONAL
            # ========================================
            # Estimar FLOPs y latencia ( solo la primera vez)
            flops_est = 0
            latency_s = 0.0
            
            if fold_idx == 1:  # Solo calcular una vez  (mismo modelo para todos los folds)
                try:
                    # Calcular dimensiones para FLOPs
                    with torch.no_grad():
                        x_dummy = torch.randn(1, 8, X_test.shape[-1], device=device)
                        z = model.conv_t(x_dummy)
                        z = model.proj(z).transpose(1, 2)
                        L, d = z.shape[1], z.shape[2]
                    
                    flops_est = estimate_transformer_flops(L=L, d=d, n_heads=n_heads, n_layers=encoder_layers)
                    
                    # Benchmark latencia
                    x_sample = X_test[:min(8, len(X_test))]
                    x_sample_t = torch.tensor(x_sample, dtype=torch.float32, device=device)
                    latency_s = benchmark_latency(model, x_sample_t, device, n_warm=10, n_runs=50)
                    
                    print(f"  ✓ FLOPs estimados: {flops_est/1e9:.2f} G")
                    print(f"  ✓ Latencia: {latency_s*1000:.2f} ms")
                except Exception as e:
                    print(f"  ⚠ Error al calcular FLOPs/latencia: {e}")
            
            # Resultado del fold con métricas de consumo
            fold_result = {
                'fold': fold_idx,
                'accuracy': final_acc,
                'f1_macro': final_f1,
                'n_params': n_params,
                'best_epoch': len(history["epoch"]),
                'train_time_s': train_time_s,
                'train_mem_mb': train_mem_mb,
                'test_time_s': test_time_s,
                'test_mem_mb': test_mem_mb,
                'flops_est': flops_est if fold_idx == 1 else 0,
                'latency_s': latency_s if fold_idx == 1 else 0.0
            }
            fold_results.append(fold_result)
            
            print(f"\n✓ Fold {fold_idx} completado:")
            print(f"  Accuracy: {final_acc:.4f}")
            print(f"  F1-score: {final_f1:.4f}")
            
            # Limpiar memoria
            del model, optimizer, scheduler, train_loader, X_train, y_train
            torch.cuda.empty_cache()
            
        except Exception as e:
            print(f"✗ ERROR en Fold {fold_idx}: {e}")
            import traceback
            traceback.print_exc()
            acc_folds.append(0.0)
            f1_folds.append(0.0)
    
    # ========================================
    # RESULTADOS GLOBALES
    # ========================================
    mean_acc = np.mean(acc_folds)
    std_acc = np.std(acc_folds)
    mean_f1 = np.mean(f1_folds)
    std_f1 = np.std(f1_folds)
    best_fold = np.argmax(f1_folds) + 1
    
    print("\n" + "="*80)
    print("RESULTADOS FINALES - MODELO GANADOR")
    print("="*80)
    print(f"\nAccuracy por fold: {[f'{a:.4f}' for a in acc_folds]}")
    print(f"Accuracy promedio: {mean_acc:.4f} ± {std_acc:.4f}")
    print(f"\nF1-score por fold: {[f'{f:.4f}' for f in f1_folds]}")
    print(f"F1-score promedio: {mean_f1:.4f} ± {std_f1:.4f}")
    print(f"\nMejor fold: {best_fold} (F1 = {f1_folds[best_fold-1]:.4f})")
    print("="*80)
    
    # Guardar resumen
    summary_df = pd.DataFrame(fold_results)
    summary_df.to_csv(base_dir / f"summary_{tag}.csv", index=False)
    print(f"\n✓ Resumen guardado: {base_dir / f'summary_{tag}.csv'}")
    
    # ========================================
    # VISUALIZACIONES GLOBALES
    # ========================================
    print("\nGenerando visualizaciones globales...")
    
    # 1. Matriz de confusión global
    try:
        cm_global_png = base_dir / f"confusion_global_{tag}.png"
        plot_confusion_with_text(
            cm=global_cm,
            class_names=CLASS_NAMES,
            title=f"Confusion Matrix — Global (5 folds) [{tag}]",
            out_path=cm_global_png,
            cmap='Blues'
        )
        print(f"  ✓ Matriz de confusión global: {cm_global_png.name}")
    except Exception as e:
        print(f"  ⚠ Error en matriz de confusión: {e}")
    
    # 2. Curvas de entrenamiento
    try:
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
        
        for fold_idx in range(1, 6):
            hist_path = base_dir / f"fold{fold_idx}" / f"history_fold{fold_idx}_{tag}.npz"
            if hist_path.exists():
                data = np.load(hist_path)
                axes[0].plot(data["ep"], data["tr_loss"], alpha=0.5, label=f"Fold {fold_idx}")
                axes[1].plot(data["ep"], data["val_f1"], alpha=0.5, label=f"Fold {fold_idx}")
        
        axes[0].set_xlabel("Epoch")
        axes[0].set_ylabel("Training Loss")
        axes[0].set_title("Training Loss Evolution")
        axes[0].legend()
        axes[0].grid(alpha=0.3)
        
        axes[1].set_xlabel("Epoch")
        axes[1].set_ylabel("Validation F1-score")
        axes[1].set_title("Validation F1-score Evolution")
        axes[1].legend()
        axes[1].grid(alpha=0.3)
        
        plt.tight_layout()
        training_curves_png = base_dir / f"training_curves_{tag}.png"
        plt.savefig(training_curves_png, dpi=150, bbox_inches='tight')
        plt.close()
        print(f"  ✓ Curvas de entrenamiento: {training_curves_png.name}")
    except Exception as e:
        print(f"  ⚠ Error en curvas de entrenamiento: {e}")
    
    # 3. Topomaps globales (promedio de los 5 folds)
    try:
        info_topo_global = make_mne_info_8ch(sfreq=160.0)
        
        if len(saliency_list) > 0:
            sal_mean = np.mean(np.stack(saliency_list, axis=0), axis=0)
            topo_sal_global_png = base_dir / f"topomap_saliency_GLOBAL_{tag}.png"
            plot_topomap_from_values(
                sal_mean,
                info_topo_global,
                title=f"Channel Saliency — GLOBAL (5 folds) [{tag}]",
                out_path=topo_sal_global_png,
                cmap="Reds",
                cbar_label="Normalized saliency"
            )
            print(f"  ✓ Topomap global de saliency: {topo_sal_global_png.name}")
        
        if len(pvals_log_list) > 0:
            pvals_log_mean = np.mean(np.stack(pvals_log_list, axis=0), axis=0)
            topo_p_global_png = base_dir / f"topomap_log10p_GLOBAL_{tag}.png"
            plot_topomap_from_values(
                pvals_log_mean,
                info_topo_global,
                title=f"-log10(p) Left vs Right — GLOBAL (5 folds) [{tag}]",
                out_path=topo_p_global_png,
                cmap="viridis",
                cbar_label="-log10(p)"
            )
            print(f"  ✓ Topomap global de -log10(p): {topo_p_global_png.name}")
    except Exception as e:
        print(f"  ⚠ Error en topomaps globales: {e}")
    
    # ========================================
    # GUARDAR MÉTRICAS COMPUTACIONALES Y RETORNAR
    # ========================================
    try:
        # Calcular promedios de consumo
        train_times = [fr['train_time_s'] for fr in fold_results if fr['train_time_s'] > 0]
        train_mems = [fr['train_mem_mb'] for fr in fold_results if fr['train_mem_mb'] > 0]
        test_times = [fr['test_time_s'] for fr in fold_results if fr['test_time_s'] > 0]
        test_mems = [fr['test_mem_mb'] for fr in fold_results if fr['test_mem_mb'] > 0]
        
        train_time_mean = np.mean(train_times) if train_times else 0.0
        train_time_std = np.std(train_times) if train_times else 0.0
        train_mem_mean = np.mean(train_mems) if train_mems else 0.0
        train_mem_std = np.std(train_mems) if train_mems else 0.0
        test_time_mean = np.mean(test_times) if test_times else 0.0
        test_time_std = np.std(test_times) if test_times else 0.0
        test_mem_mean = np.mean(test_mems) if test_mems else 0.0
        test_mem_std = np.std(test_mems) if test_mems else 0.0
        
        # FLOPs y latencia (del primer fold)
        flops_est = fold_results[0]['flops_est'] if fold_results else 0
        latency_s = fold_results[0]['latency_s'] if fold_results else 0.0
        n_params_total = fold_results[0]['n_params'] if fold_results else 0
        
        print("\n" + "="*80)
        print("MÉTRICAS DE CONSUMO COMPUTACIONAL")
        print("="*80)
        print(f"\nModelo: {tag}")
        print(f"Parámetros: {n_params_total:,} ({n_params_total/1e6:.2f}M)")
        print(f"FLOPs estimados: {flops_est/1e9:.2f} G")
        print(f"Latencia: {latency_s*1000:.2f} ms")
        print(f"\nTiempo de entrenamiento: {train_time_mean:.2f} ± {train_time_std:.2f} s")
        print(f"Memoria de entrenamiento: {train_mem_mean:.1f} ± {train_mem_std:.1f} MB")
        print(f"Tiempo de test: {test_time_mean:.2f} ± {test_time_std:.2f} s")
        print(f"Memoria de test: {test_mem_mean:.1f} ± {test_mem_std:.1f} MB")
        print("="*80)
        
        consumption_df = pd.DataFrame([{
            'model_tag': tag,
            'n_params': n_params_total,
            'params_M': n_params_total / 1e6,
            'flops_G': flops_est / 1e9,
            'latency_ms': latency_s * 1000,
            'train_time_mean_s': train_time_mean,
            'train_time_std_s': train_time_std,
            'train_mem_mean_mb': train_mem_mean,
            'train_mem_std_mb': train_mem_std,
            'test_time_mean_s': test_time_mean,
            'test_time_std_s': test_time_std,
            'test_mem_mean_mb': test_mem_mean,
            'test_mem_std_mb': test_mem_std
        }])
        
        consumption_path = base_dir / f'computational_metrics_{tag}.csv'
        consumption_df.to_csv(consumption_path, index=False)
        print(f"\n  ✓ Métricas de consumo guardadas: {consumption_path.name}")
        
    except Exception as e:
        print(f"  ⚠ Error al generar métricas de consumo: {e}")
        n_params_total = 0

    # Construir diccionario de resultados final
    results = {
        'fold_results': fold_results,
        'mean_acc': mean_acc,
        'std_acc': std_acc,
        'mean_f1': mean_f1,
        'std_f1': std_f1,
        'best_fold': best_fold,
        'output_dir': base_dir,
        'model_tag': tag,
        'n_params': n_params_total
    }
    
    # Copiar mejor modelo
    try:
        best_model_src = base_dir / f"fold{best_fold}" / f"best_model_fold{best_fold}.pt"
        best_model_dst = base_dir / f"final_model_{tag}.pt"
        
        if best_model_src.exists():
            shutil.copy(best_model_src, best_model_dst)
            print(f"\n✓ Modelo final guardado: {best_model_dst.name}")
            print(f"  (Copiado del mejor fold: {best_fold})")
        else:
            print(f"\n⚠ No se encontró el modelo del fold {best_fold}")
            
    except Exception as e:
        print(f"\n⚠ Error al copiar modelo final: {e}")

    print(f"\n{'='*80}")
    print(f"✓ ENTRENAMIENTO COMPLETADO")
    print(f"  Directorio: {base_dir}")
    print(f"  Modelo final: final_model_{tag}.pt")
    print(f"  Resultados: summary_{tag}.csv")
    print(f"{'='*80}\n")
    
    return results

print("✓ Funciones de Grid Search y Entrenamiento de Modelo Ganador definidas")

✓ Funciones de Grid Search estadístico definidas
✓ Funciones de Grid Search y Entrenamiento de Modelo Ganador definidas


### Ejecución del Grid Search

Ejecuta la búsqueda estadística de hiperparámetros sobre el modelo ganador.

In [30]:
# =========================
# EJECUTAR GRID SEARCH
# =========================

if __name__ == "__main__":
    print("\n" + "="*80)
    print("GRID SEARCH ESTADÍSTICO - Modelo Ganador nb2_h6")
    print("="*80 + "\n")
    
    # Configuración
    metric_to_optimize = 'f1'  # Opciones: 'f1' o 'accuracy'
    
    print("ADVERTENCIA: Este proceso puede tardar varias horas.")
    print(f"  • 8 configuraciones × 5 folds = 40 entrenamientos completos")
    print(f"  • Tiempo estimado: ~4-6 horas (depende de GPU/hardware)\n")
    
    # Ejecutar grid search
    df_grid_results = run_statistical_grid_search(
        device=DEVICE,
        folds_json=FOLDS_JSON,
        output_dir=PROJ / 'models' / '04_hybrid' / 'grid_search',
        metric=metric_to_optimize
    )
    
    # Mostrar resultados
    print("\n" + "="*80)
    print("RESULTADOS DEL GRID SEARCH")
    print("="*80)
    print(df_grid_results.to_string(index=False))
    
    # Generar visualizaciones
    print("\nGenerando visualizaciones...")
    plot_grid_search_results(
        df_grid_results,
        output_dir=PROJ / 'models' / '04_hybrid' / 'grid_search',
        metric=metric_to_optimize
    )
    
    # Análisis estadístico comparativo
    print("\n" + "="*80)
    print("ANÁLISIS ESTADÍSTICO COMPARATIVO")
    print("="*80 + "\n")
    
    # Comparar mejor vs baseline (primera config)
    best_config = df_grid_results.iloc[0]
    baseline_config = df_grid_results[df_grid_results['config_id'] == 'config_01'].iloc[0]
    
    print("Mejor configuración:")
    print(f"  • LR: {best_config['learning_rate']:.0e}")
    print(f"  • Dropout: {best_config['dropout']}")
    print(f"  • Layers: {int(best_config['encoder_layers'])}")
    print(f"  • Score: {best_config['mean_score']:.4f} ± {best_config['std_score']:.4f}")
    
    print(f"\nBaseline (config_01):")
    print(f"  • LR: {baseline_config['learning_rate']:.0e}")
    print(f"  • Dropout: {baseline_config['dropout']}")
    print(f"  • Layers: {int(baseline_config['encoder_layers'])}")
    print(f"  • Score: {baseline_config['mean_score']:.4f} ± {baseline_config['std_score']:.4f}")
    
    improvement = ((best_config['mean_score'] - baseline_config['mean_score']) / baseline_config['mean_score']) * 100
    print(f"\nMejora relativa: {improvement:+.2f}%")
    
    # Preparar datos para test de Wilcoxon
    print("\n" + "="*80)
    print("DATOS PARA TEST DE WILCOXON")
    print("="*80 + "\n")
    
    print("Para comparar la mejor configuración vs baseline usando test de Wilcoxon:")
    print("\nScores por fold - Mejor configuración:")
    best_scores = [best_config[f'fold_{i}'] for i in range(1, 6)]
    print(f"  {best_scores}")
    
    print("\nScores por fold - Baseline:")
    baseline_scores = [baseline_config[f'fold_{i}'] for i in range(1, 6)]
    print(f"  {baseline_scores}")
    
    # Test de Wilcoxon
    from scipy.stats import wilcoxon
    
    try:
        stat, p_value = wilcoxon(best_scores, baseline_scores)
        print(f"\nResultado del Test de Wilcoxon:")
        print(f"  • Estadístico: {stat:.4f}")
        print(f"  • p-valor: {p_value:.4f}")
        
        if p_value < 0.05:
            print(f"  • Conclusión: Diferencia SIGNIFICATIVA (p < 0.05) ✓")
        else:
            print(f"  • Conclusión: Diferencia NO significativa (p ≥ 0.05)")
    except Exception as e:
        print(f"\nNo se pudo realizar el test de Wilcoxon: {e}")
        print("(Puede ocurrir si las diferencias son todas cero)")
    
    print("\n" + "="*80)
    print("GRID SEARCH COMPLETADO")
    print("="*80)
    print(f"\nResultados guardados en: {PROJ / 'models' / '04_hybrid' / 'grid_search'}")
    print("  • grid_search_results.csv - Datos completos")
    print("  • grid_search_comparison.png - Gráfico de barras")
    print("  • grid_search_heatmaps.png - Heatmaps de interacciones")
    print("  • grid_search_table.png - Tabla resumen")
    print("="*80 + "\n")


GRID SEARCH ESTADÍSTICO - Modelo Ganador nb2_h6

ADVERTENCIA: Este proceso puede tardar varias horas.
  • 8 configuraciones × 5 folds = 40 entrenamientos completos
  • Tiempo estimado: ~4-6 horas (depende de GPU/hardware)

GRID SEARCH ESTADÍSTICO - Modelo Ganador nb2_h6

Parámetros Fijos:
  • model_name: nb2_h6
  • d_model: 144
  • n_blocks: 2
  • n_heads: 6
  • warmup_epochs: 4
  • patience: 12
  • batch_size: 64
  • epochs: 60

Espacio de Búsqueda:
  • learning_rates: [0.0005, 0.001]
  • dropouts: [0.1, 0.2]
  • encoder_layers: [1, 2]

Total de configuraciones: 8
Total de entrenamientos: 40 (8 configs × 5 folds)
Métrica objetivo: F1
Directorio de salida: /root/Proyecto/EEG_Clasificador/models/04_hybrid/grid_search


CONFIGURACIÓN 1/8
Learning Rate: 0.0005
Dropout: 0.1 (CNN + Encoder)
Encoder Layers: 1
--------------------------------------------------------------------------------

--- Fold 1/5 ---
Train: 82 sujetos | Test: 21 sujetos


                                                               

Train: (3444, 8, 961) | Test: (882, 8, 961)
Parámetros del modelo: 232,290


KeyboardInterrupt: 

---
## ENTRENAR MODELO GANADOR

Esta sección permite entrenar directamente el modelo ganador con hiperparámetros personalizados, ejecutar los 5 folds y generar todos los artefactos y visualizaciones.

### 🏆 Modelo Ganador (nb2_h6):

**Arquitectura Base**:
- **Embedding dimension** (`d_model`): 144
- **Attention Heads**: 6
- **Depthwise Blocks**: 2
- **Encoder Layers**: 1

**Hiperparámetros Optimizados**:
- **Learning Rate**: 1e-3 (0.001)
- **Dropout**: 0.2 (aplicado a CNN y Encoder)
- **Warmup Epochs**: 4
- **Patience**: 12

### 📊 Artefactos Generados:

1. **Modelos Guardados**:
   - `best_model_fold{1-5}.pt`: Mejor modelo de cada fold
   - `final_model_<tag>.pt`: Modelo final (copiado del mejor fold)

2. **Métricas y Resultados**:
   - `summary_<tag>.csv`: Resultados por fold en formato CSV
   - `classification_report_fold{1-5}.txt`: Reporte completo de clasificación
   - `history_fold{1-5}_<tag>.npz`: Historial de entrenamiento (loss, acc, f1, lr)

3. **Visualizaciones Globales**:
   - `confusion_global_<tag>.png`: Matriz de confusión agregada de los 5 folds
   - `training_curves_<tag>.png`: Curvas de loss y F1 por fold
   - `metrics_per_fold_<tag>.png`: Barras comparativas de accuracy y F1 por fold
   - `topomap_saliency_GLOBAL_<tag>.png`: Topomap promedio de saliency
   - `topomap_log10p_GLOBAL_<tag>.png`: Topomap promedio de -log10(p)

4. **Visualizaciones por Fold**:
   - `confusion_fold{1-5}.png`: Matriz de confusión de cada fold
   - `topomap_saliency_fold{1-5}.png`: Canal saliency por fold
   - `topomap_log10p_fold{1-5}.png`: Significancia estadística por canal

### ⏱️ Tiempo Estimado:
- **Por fold**: ~5-10 minutos
- **Total (5 folds)**: ~25-50 minutos (depende del hardware)

### 📝 Uso Básico:

```python
# Entrenar con hiperparámetros por defecto (modelo ganador)
results = train_winner_model(
    device=DEVICE
)

# Entrenar con hiperparámetros personalizados
results = train_winner_model(
    device=DEVICE,
    learning_rate=1e-3,
    dropout=0.2,
    encoder_layers=1,
    warmup_epochs=4,
    patience=12,
    d_model=144,
    n_heads=6,
    n_blocks=2,
    model_tag='winner_custom'
)
```

### 🔍 Estructura de Resultados:

```python
results = {
    'fold_results': [     # Lista con resultados de cada fold
        {
            'fold': 1,
            'accuracy': 0.XXXX,
            'f1_macro': 0.XXXX,
            'n_params': XXXXX,
            'best_epoch': XX
        },
        ...
    ],
    'mean_acc': 0.XXXX,   # Accuracy promedio
    'std_acc': 0.XXXX,    # Desviación estándar de accuracy
    'mean_f1': 0.XXXX,    # F1-score promedio
    'std_f1': 0.XXXX,     # Desviación estándar de F1
    'best_fold': X,       # Número del mejor fold
    'output_dir': Path,   # Directorio con todos los archivos
    'model_tag': 'winner' # Tag del modelo
}
```

### ⚠️ Notas Importantes:

1. **Reproducibilidad**: Usa semillas fijas por fold (RANDOM_STATE + fold_idx)
2. **Memoria**: Limpia GPU automáticamente después de cada fold
3. **Early Stopping**: Se aplica con la paciencia configurada
4. **Scheduler**: Usa warmup + cosine annealing
5. **Augmentación**: Se activa después de los epochs de warmup
6. **EMA**: Se usa Model EMA si USE_EMA=True

In [31]:
# =========================
# ENTRENAR MODELO GANADOR
# =========================

# Entrenar el modelo ganador con los hiperparámetros optimizados
# Ejecuta los 5 folds y genera todos los gráficos y artefactos

results = train_winner_model(
    device=DEVICE,
    # Hiperparámetros del modelo ganador
    learning_rate=1e-3,        # 0.001
    dropout=0.2,               # 20% dropout en CNN y Encoder
    encoder_layers=1,          # 1 capa de Transformer
    warmup_epochs=4,           # 4 épocas de warmup
    patience=12,               # Early stopping con paciencia 12
    # Arquitectura base (nb2_h6)
    d_model=144,               # Dimensión de embedding
    n_heads=6,                 # 6 attention heads
    n_blocks=2,                # 2 bloques depthwise
    # Configuración de entrenamiento
    batch_size=BATCH_SIZE,     # Usa batch_size global
    epochs=EPOCHS,             # Usa epochs global
    # Output
    output_dir=PROJ / 'models' / '04_hybrid' / 'winner_model',
    model_tag='winner_nb2_h6'  # Tag identificador
)

# Mostrar resultados
print("\n" + "="*80)
print("RESUMEN DE RESULTADOS")
print("="*80)
print(f"\nModelo: {results['model_tag']}")
print(f"Directorio: {results['output_dir']}")
print(f"\nAccuracy: {results['mean_acc']:.4f} ± {results['std_acc']:.4f}")
print(f"F1-Score: {results['mean_f1']:.4f} ± {results['std_f1']:.4f}")
print(f"\nMejor Fold: {results['best_fold']}")
print(f"F1 del mejor fold: {results['fold_results'][results['best_fold']-1]['f1_macro']:.4f}")
print("\n" + "="*80)

# Mostrar resultados por fold
print("\nResultados detallados por fold:")
print("-"*80)
for fold_res in results['fold_results']:
    print(f"Fold {fold_res['fold']}: ACC={fold_res['accuracy']:.4f}, F1={fold_res['f1_macro']:.4f}, "
          f"Params={fold_res['n_params']:,}, Best Epoch={fold_res['best_epoch']}")
print("-"*80)

final_model_path = results["output_dir"] / f"final_model_{results['model_tag']}.pt"
print(f"\n✓ Modelo final guardado en: {final_model_path}")
print(f"✓ Todos los gráficos y métricas disponibles en: {results['output_dir']}")
print("\n" + "="*80)

ENTRENAR MODELO GANADOR - Configuración Personalizada

Hiperparámetros del Modelo:
  • Learning Rate: 0.001
  • Dropout (CNN + Encoder): 0.2
  • Encoder Layers: 1
  • Warmup Epochs: 4
  • Patience: 12
  • d_model: 144
  • Attention Heads: 6
  • Depthwise Blocks: 2
  • Batch Size: 64
  • Max Epochs: 60

Directorio de salida: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1
Tag del modelo: winner_nb2_h6_lr1e-03_do0.2_el1


FOLD 1/5
Train: 82 sujetos | Test: 21 sujetos


                                                               

Train: (3444, 8, 961) | Test: (882, 8, 961)
Parámetros del modelo: 232,290
Epoch 10/60 | TrLoss: 0.0695 | ValLoss: 0.0820 | ValAcc: 0.7993 | ValF1: 0.7993 | LR: 9.75e-04
Epoch 20/60 | TrLoss: 0.0601 | ValLoss: 0.0801 | ValAcc: 0.7993 | ValF1: 0.7993 | LR: 8.31e-04
Epoch 30/60 | TrLoss: 0.0553 | ValLoss: 0.0804 | ValAcc: 0.7959 | ValF1: 0.7959 | LR: 6.00e-04
Early stopping en epoch 35

Classification Report - Fold 1:
              precision    recall  f1-score   support

        left     0.7946    0.8073    0.8009       441
       right     0.8041    0.7914    0.7977       441

    accuracy                         0.7993       882
   macro avg     0.7994    0.7993    0.7993       882
weighted avg     0.7994    0.7993    0.7993       882

↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold1/topomap_saliency_fold1.png
↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do

                                                               

Train: (3444, 8, 961) | Test: (882, 8, 961)
Parámetros del modelo: 232,290
Epoch 10/60 | TrLoss: 0.0680 | ValLoss: 0.0718 | ValAcc: 0.8333 | ValF1: 0.8333 | LR: 9.75e-04
Epoch 20/60 | TrLoss: 0.0578 | ValLoss: 0.0717 | ValAcc: 0.8367 | ValF1: 0.8367 | LR: 8.31e-04
Early stopping en epoch 28

Classification Report - Fold 2:
              precision    recall  f1-score   support

        left     0.7923    0.8912    0.8388       441
       right     0.8756    0.7664    0.8174       441

    accuracy                         0.8288       882
   macro avg     0.8340    0.8288    0.8281       882
weighted avg     0.8340    0.8288    0.8281       882

↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold2/topomap_saliency_fold2.png
↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold2/topomap_log10p_fold2.png
✓ Interpretabilidad generada para fold 2

✓ Fold 2 compl

                                                               

Train: (3444, 8, 961) | Test: (882, 8, 961)
Parámetros del modelo: 232,290
Epoch 10/60 | TrLoss: 0.0718 | ValLoss: 0.0811 | ValAcc: 0.7971 | ValF1: 0.7959 | LR: 9.75e-04
Epoch 20/60 | TrLoss: 0.0599 | ValLoss: 0.0840 | ValAcc: 0.8084 | ValF1: 0.8084 | LR: 8.31e-04
Epoch 30/60 | TrLoss: 0.0516 | ValLoss: 0.0843 | ValAcc: 0.8095 | ValF1: 0.8095 | LR: 6.00e-04
Early stopping en epoch 39

Classification Report - Fold 3:
              precision    recall  f1-score   support

        left     0.8176    0.8027    0.8101       441
       right     0.8062    0.8209    0.8135       441

    accuracy                         0.8118       882
   macro avg     0.8119    0.8118    0.8118       882
weighted avg     0.8119    0.8118    0.8118       882

↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold3/topomap_saliency_fold3.png
↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do

                                                               

Train: (3486, 8, 961) | Test: (840, 8, 961)
Parámetros del modelo: 232,290
Epoch 10/60 | TrLoss: 0.0731 | ValLoss: 0.0726 | ValAcc: 0.8298 | ValF1: 0.8296 | LR: 9.75e-04
Early stopping en epoch 16

Classification Report - Fold 4:
              precision    recall  f1-score   support

        left     0.8139    0.8643    0.8383       420
       right     0.8553    0.8024    0.8280       420

    accuracy                         0.8333       840
   macro avg     0.8346    0.8333    0.8332       840
weighted avg     0.8346    0.8333    0.8332       840

↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold4/topomap_saliency_fold4.png
↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold4/topomap_log10p_fold4.png
✓ Interpretabilidad generada para fold 4

✓ Fold 4 completado:
  Accuracy: 0.8333
  F1-score: 0.8332

FOLD 5/5
Train: 83 sujetos | Test: 20 sujetos


                                                               

Train: (3486, 8, 961) | Test: (840, 8, 961)
Parámetros del modelo: 232,290
Epoch 10/60 | TrLoss: 0.0718 | ValLoss: 0.0795 | ValAcc: 0.8036 | ValF1: 0.8006 | LR: 9.75e-04
Epoch 20/60 | TrLoss: 0.0604 | ValLoss: 0.0665 | ValAcc: 0.8512 | ValF1: 0.8512 | LR: 8.31e-04
Early stopping en epoch 20

Classification Report - Fold 5:
              precision    recall  f1-score   support

        left     0.8801    0.7690    0.8208       420
       right     0.7949    0.8952    0.8421       420

    accuracy                         0.8321       840
   macro avg     0.8375    0.8321    0.8315       840
weighted avg     0.8375    0.8321    0.8315       840

↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold5/topomap_saliency_fold5.png
↳ Topomap guardado: /root/Proyecto/EEG_Clasificador/models/04_hybrid/winner_model/winner_nb2_h6_lr1e-03_do0.2_el1/fold5/topomap_log10p_fold5.png
✓ Interpretabilidad generada para fold 5

✓ Fold 5 compl