# K-Fold Cross-Validation Estratificado por Paciente

## Objetivo
Este notebook implementa **K-Fold Cross-Validation estratificada por paciente** para datos mÃ©dicos longitudinales.

## Ventajas sobre LOPO-CV
- âœ… **Test mÃ¡s robusto:** 8 pacientes por fold (vs 1 en LOPO)
- âœ… **MÃ¡s rÃ¡pido:** 10 folds (vs 82 en LOPO)
- âœ… **Mejor balance train/test:** 74/8 pacientes (vs 81/1)
- âœ… **Intervalos de confianza:** IC 95% sobre 10 folds
- âœ… **Sin data leakage:** Todas las visitas de un paciente en el mismo split

## ConfiguraciÃ³n
- **Dataset:** 6,488 visitas de ~5,000 pacientes
- **Pacientes etiquetados:** 82 pacientes con YEARS_TO_ONSET
- **K-Folds:** 10 folds estratificados
- **Test por fold:** ~8 pacientes (~64 visitas)
- **Train por fold:** ~74 pacientes (~6,424 visitas)

## Tiempo estimado
- **10 folds Ã— 100 Ã©pocas Ã— ~20 seg/fold = ~3-5 minutos** (CPU)

In [1]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import NearestNeighbors
from sklearn.model_selection import KFold
from tqdm.auto import tqdm
import json
import warnings
warnings.filterwarnings('ignore')

torch.manual_seed(42)
np.random.seed(42)

print("Libraries imported successfully!")

Libraries imported successfully!


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
def norm_codes_to_labels(s: pd.Series, mapping: dict) -> pd.Series:
    out = s.astype(str).str.strip().str.replace(r"\.0$", "", regex=True)
    out = out.map(mapping)
    return out

gender_map = {"1":"male","2":"female","male":"male","female":"female","m":"male","f":"female"}
marry_map  = {"1":"married","2":"widowed","3":"divorced","4":"never_married","6":"domestic_partnership"}

def to_year(s):
    s = pd.to_numeric(s, errors="coerce")
    s = s.where((s >= 1900) & (s <= 2100))
    return s

print("Utility functions defined.")

Utility functions defined.


In [3]:
print("Loading ADNI data...\n")

csv_path = "./data/adni/demographics/PTDEMOG.csv"
df = pd.read_csv(csv_path)
print(f"âœ… Demographics loaded: {df.shape}")

onset_cols = [c for c in ["PTCOGBEG","PTADBEG","PTADDX"] if c in df.columns]
for c in onset_cols:
    df[c] = to_year(df[c])

def row_min_nonnull(row):
    vals = [row[c] for c in onset_cols if pd.notna(row[c])]
    return min(vals) if vals else np.nan

df["YEAR_ONSET"] = df.apply(row_min_nonnull, axis=1) if onset_cols else np.nan
df["YEAR_ONSET"] = to_year(df["YEAR_ONSET"])

for c in ["PTDOBYY","PTEDUCAT"]:
    if c in df.columns:
        df[c] = pd.to_numeric(df[c], errors="coerce")

if "PTGENDER" in df.columns:
    df["PTGENDER"] = norm_codes_to_labels(df["PTGENDER"], gender_map)
if "PTMARRY" in df.columns:
    df["PTMARRY"]  = norm_codes_to_labels(df["PTMARRY"], marry_map)

print(f"Patients with YEAR_ONSET: {df['YEAR_ONSET'].notna().sum()}")

Loading ADNI data...

âœ… Demographics loaded: (6210, 84)
Patients with YEAR_ONSET: 2908


In [4]:
print("\nMerging biomarkers (LEFT JOIN strategy)...\n")

df['VISCODE_NORMALIZED'] = df['VISCODE2'].astype(str).str.strip().replace({'sc': 'bl', 'f': 'bl', 'nan': ''})

biomarker_path = "./data/adni/demographics/UPENNBIOMK_ROCHE_ELECSYS_11Oct2025.csv"
df_csf = pd.read_csv(biomarker_path)
df_csf['VISCODE_NORMALIZED'] = df_csf['VISCODE2'].astype(str).str.strip()
df_csf = df_csf.dropna(subset=['ABETA42', 'TAU', 'PTAU'])
df_csf['TAU_ABETA42_RATIO'] = df_csf['TAU'] / (df_csf['ABETA42'] + 1e-6)
df_csf['PTAU_ABETA42_RATIO'] = df_csf['PTAU'] / (df_csf['ABETA42'] + 1e-6)
df_csf['PTAU_TAU_RATIO'] = df_csf['PTAU'] / (df_csf['TAU'] + 1e-6)

df = df.merge(
    df_csf[['RID', 'VISCODE_NORMALIZED', 'ABETA42', 'TAU', 'PTAU', 
            'TAU_ABETA42_RATIO', 'PTAU_ABETA42_RATIO', 'PTAU_TAU_RATIO']],
    on=['RID', 'VISCODE_NORMALIZED'], how='left'
)
df['HAS_CSF'] = df['ABETA42'].notna().astype(float)
print(f"âœ… CSF: {df['HAS_CSF'].sum():.0f}/{len(df)} ({100*df['HAS_CSF'].mean():.1f}%)")

pet_path = "./data/adni/demographics/All_Subjects_UCBERKELEY_AMY_6MM_11Oct2025.csv"
df_pet = pd.read_csv(pet_path, low_memory=False)
df_pet['VISCODE_NORMALIZED'] = df_pet['VISCODE'].astype(str).str.strip().replace({'sc': 'bl', 'f': 'bl', 'nan': ''})
df_pet = df_pet[['RID', 'VISCODE_NORMALIZED', 'CENTILOIDS', 'SUMMARY_SUVR', 'COMPOSITE_REF_SUVR']].copy()
df_pet.columns = ['RID', 'VISCODE_NORMALIZED', 'PET_CENTILOIDS', 'PET_SUVR', 'PET_COMPOSITE']
df_pet = df_pet.dropna(subset=['PET_CENTILOIDS', 'PET_SUVR'])

df = df.merge(df_pet, on=['RID', 'VISCODE_NORMALIZED'], how='left')
df['HAS_PET'] = df['PET_CENTILOIDS'].notna().astype(float)
print(f"âœ… PET: {df['HAS_PET'].sum():.0f}/{len(df)} ({100*df['HAS_PET'].mean():.1f}%)")

mri_path = "./data/adni/demographics/All_Subjects_UCSFFSX7_11Oct2025.csv"
df_mri = pd.read_csv(mri_path, low_memory=False)
df_mri['VISCODE_NORMALIZED'] = df_mri['VISCODE2'].astype(str).str.strip().replace({'sc': 'bl', 'f': 'bl', 'nan': ''})
df_mri = df_mri[['RID', 'VISCODE_NORMALIZED', 'ST101SV', 'ST11SV', 'ST12SV', 
                 'ST4SV', 'ST5SV', 'ST17SV', 'ST18SV']].copy()
df_mri.columns = ['RID', 'VISCODE_NORMALIZED', 'MRI_eTIV', 'MRI_Vol1', 'MRI_Vol2', 
                  'MRI_Vol3', 'MRI_Vol4', 'MRI_Vol5', 'MRI_Vol6']
df_mri = df_mri.dropna(subset=['MRI_eTIV', 'MRI_Vol1'])

df = df.merge(df_mri, on=['RID', 'VISCODE_NORMALIZED'], how='left')
df['HAS_MRI'] = df['MRI_eTIV'].notna().astype(float)
df = df.drop(columns=['VISCODE_NORMALIZED'])

print(f"âœ… MRI: {df['HAS_MRI'].sum():.0f}/{len(df)} ({100*df['HAS_MRI'].mean():.1f}%)")
print(f"\nðŸ“Š Dataset final: {len(df)} visitas")


Merging biomarkers (LEFT JOIN strategy)...

âœ… CSF: 1780/6210 (28.7%)
âœ… PET: 810/6212 (13.0%)
âœ… MRI: 3303/6488 (50.9%)

ðŸ“Š Dataset final: 6488 visitas


In [5]:
date_col = "EXAMDATE" if "EXAMDATE" in df.columns else "VISDATE"
df[date_col] = pd.to_datetime(df[date_col], errors="coerce")
df["EXAM_YEAR"] = to_year(df[date_col].dt.year)
df["AGE_AT_VISIT"] = np.where(
    df["EXAM_YEAR"].notna() & df["PTDOBYY"].notna(),
    df["EXAM_YEAR"] - df["PTDOBYY"], np.nan
)

df["YEARS_TO_ONSET"] = np.where(
    df["YEAR_ONSET"].notna() & df["EXAM_YEAR"].notna(),
    df["YEAR_ONSET"] - df["EXAM_YEAR"], np.nan
)

df.loc[(df["YEARS_TO_ONSET"] < 0) & df["YEAR_ONSET"].notna(), "YEARS_TO_ONSET"] = np.nan
df.loc[df["YEARS_TO_ONSET"] > 50, "YEARS_TO_ONSET"] = np.nan
df["HAS_LABEL"] = df["YEARS_TO_ONSET"].notna()

df["USE_FOR_LABEL"] = False
if df["HAS_LABEL"].any():
    idx_last_pre = df.loc[df["HAS_LABEL"]].groupby("RID")[date_col].idxmax()
    df.loc[idx_last_pre, "USE_FOR_LABEL"] = True

rids_with_label = df.loc[df["USE_FOR_LABEL"], "RID"].dropna().unique()

print(f"\nðŸ“Š Dataset preparado:")
print(f"  Total visitas: {len(df)}")
print(f"  Pacientes Ãºnicos: {df['RID'].nunique()}")
print(f"  Pacientes con etiqueta: {len(rids_with_label)}")
print(f"  Pacientes sin etiqueta: {df['RID'].nunique() - len(rids_with_label)}")


ðŸ“Š Dataset preparado:
  Total visitas: 6488
  Pacientes Ãºnicos: 4945
  Pacientes con etiqueta: 82
  Pacientes sin etiqueta: 4863


In [6]:
class GNNRegressor(nn.Module):
    def __init__(self, in_ch, hid=64, dropout=0.3):
        super().__init__()
        self.c1 = GCNConv(in_ch, hid)
        self.c2 = GCNConv(hid, 1)
        self.dropout = dropout
        
    def forward(self, x, edge_index):
        x = F.relu(self.c1(x, edge_index))
        x = F.dropout(x, p=self.dropout, training=self.training)
        x = self.c2(x, edge_index)
        return x.squeeze(-1)

print("âœ… Model defined")

âœ… Model defined


In [7]:
def train_kfold(df_fold, test_rids, n_epochs=100):
    """
    Entrena un fold con mÃºltiples pacientes de test.
    
    Args:
        df_fold: DataFrame completo
        test_rids: Lista de RIDs para test
        n_epochs: Ã‰pocas de entrenamiento
    """
    csf_cols = ['ABETA42', 'TAU', 'PTAU', 'TAU_ABETA42_RATIO', 'PTAU_ABETA42_RATIO', 'PTAU_TAU_RATIO']
    pet_cols = ['PET_CENTILOIDS', 'PET_SUVR', 'PET_COMPOSITE']
    mri_cols = ['MRI_eTIV', 'MRI_Vol1', 'MRI_Vol2', 'MRI_Vol3', 'MRI_Vol4', 'MRI_Vol5', 'MRI_Vol6']
    missing_indicators = ['HAS_CSF', 'HAS_PET', 'HAS_MRI']
    
    num_cols = [c for c in ["AGE_AT_VISIT", "PTEDUCAT"] + csf_cols + pet_cols + mri_cols + missing_indicators if c in df_fold.columns]
    cat_cols = [c for c in ["PTGENDER", "PTMARRY"] if c in df_fold.columns]
    
    for c in csf_cols + pet_cols + mri_cols:
        if c in df_fold.columns:
            df_fold[c] = pd.to_numeric(df_fold[c], errors="coerce").fillna(0.0)
    
    for c in ["AGE_AT_VISIT", "PTEDUCAT"]:
        if c in df_fold.columns:
            df_fold[c] = pd.to_numeric(df_fold[c], errors="coerce").fillna(df_fold[c].median())
    
    parts = []
    if num_cols:
        scaler = StandardScaler()
        X_num = pd.DataFrame(
            scaler.fit_transform(df_fold[num_cols]),
            columns=num_cols, index=df_fold.index
        )
        parts.append(X_num)
    
    if cat_cols:
        X_cat = pd.get_dummies(df_fold[cat_cols], prefix=cat_cols, drop_first=False, dtype=float)
        parts.append(X_cat)
    
    X = pd.concat(parts, axis=1).astype(float)
    X_clean = X.replace([np.inf, -np.inf], np.nan).fillna(0.0)
    
    n_samples = X_clean.shape[0]
    k = min(8, max(1, n_samples - 1))
    n_neighbors = min(n_samples, k + 1)
    
    if n_samples >= 2:
        nbrs = NearestNeighbors(n_neighbors=n_neighbors, metric="euclidean")
        nbrs.fit(X_clean.values)
        _, idx = nbrs.kneighbors(X_clean.values)
        src_knn, dst_knn = [], []
        for i in range(idx.shape[0]):
            for j in idx[i, 1:]:
                src_knn.append(i)
                dst_knn.append(j)
        edge_knn = torch.tensor([src_knn, dst_knn], dtype=torch.long)
    else:
        edge_knn = torch.empty((2, 0), dtype=torch.long)
    
    tmp = df_fold.reset_index()[["index", "RID", date_col]].dropna(subset=[date_col]).sort_values(["RID", date_col])
    src_tmp, dst_tmp = [], []
    for rid, g in tmp.groupby("RID"):
        ids = g["index"].tolist()
        for a, b in zip(ids[:-1], ids[1:]):
            src_tmp.append(a)
            dst_tmp.append(b)
    edge_tmp = torch.tensor([src_tmp, dst_tmp], dtype=torch.long) if src_tmp else torch.empty((2,0), dtype=torch.long)
    
    def undirected(e):
        return torch.cat([e, e.flip(0)], dim=1) if e.numel() else e
    
    edges = []
    if edge_knn.numel(): edges.append(undirected(edge_knn))
    if edge_tmp.numel(): edges.append(undirected(edge_tmp))
    edge_index = torch.cat(edges, dim=1) if edges else torch.empty((2,0), dtype=torch.long)
    if edge_index.numel():
        edge_index = torch.unique(edge_index, dim=1)
    
    node_rids = df_fold["RID"].to_numpy()
    use_for_label = df_fold["USE_FOR_LABEL"].to_numpy()
    
    train_mask_np = (~np.isin(node_rids, test_rids)) & use_for_label
    test_mask_np = np.isin(node_rids, test_rids) & use_for_label
    
    node_split = np.where(np.isin(node_rids, test_rids), "test", "train")
    split_map = {"train":0, "test":1}
    split_idx = np.vectorize(split_map.get)(node_split)
    src_np = edge_index[0].cpu().numpy()
    dst_np = edge_index[1].cpu().numpy()
    keep_edges = split_idx[src_np] == split_idx[dst_np]
    edge_index = edge_index[:, torch.tensor(keep_edges)]
    
    y_full = df_fold["YEARS_TO_ONSET"].astype(float)
    y_mu  = float(y_full[train_mask_np].mean()) if train_mask_np.any() else 0.0
    y_std = float(y_full[train_mask_np].std(ddof=0)) if train_mask_np.any() else 1.0
    if not np.isfinite(y_std) or y_std == 0.0:
        y_std = 1.0
    
    y_scaled = (y_full - y_mu) / y_std
    y_t = torch.tensor(y_scaled.fillna(0).values, dtype=torch.float32)
    
    x = torch.tensor(X_clean.values, dtype=torch.float32)
    train_mask = torch.tensor(train_mask_np, dtype=torch.bool)
    test_mask  = torch.tensor(test_mask_np,  dtype=torch.bool)
    
    data = Data(x=x, edge_index=edge_index, y=y_t,
                train_mask=train_mask, test_mask=test_mask)
    
    device = torch.device("cpu")
    model = GNNRegressor(in_ch=data.num_node_features, hid=64, dropout=0.3).to(device)
    data = data.to(device)
    
    opt = torch.optim.AdamW(model.parameters(), lr=1e-2, weight_decay=1e-4)
    loss_fn = nn.MSELoss()
    
    for ep in range(n_epochs):
        model.train()
        opt.zero_grad()
        out = model(data.x, data.edge_index)
        
        if data.train_mask.any():
            loss = loss_fn(out[data.train_mask], data.y[data.train_mask])
            if torch.isnan(loss) or torch.isinf(loss):
                break
            loss.backward()
            opt.step()
    
    model.eval()
    with torch.no_grad():
        final_out = model(data.x, data.edge_index)
        
        if not test_mask.any():
            return {"test_mae": np.nan, "test_rmse": np.nan, "n_test": 0}
        
        mae_scaled = torch.mean(torch.abs(final_out[test_mask] - data.y[test_mask])).item()
        rmse_scaled = torch.sqrt(loss_fn(final_out[test_mask], data.y[test_mask])).item()
        
        test_mae = mae_scaled * y_std
        test_rmse = rmse_scaled * y_std
    
    return {
        "test_mae": float(test_mae),
        "test_rmse": float(test_rmse),
        "n_train": int(train_mask_np.sum()),
        "n_test": int(test_mask_np.sum()),
        "test_rids": list(map(int, test_rids))
    }

print("âœ… K-Fold function defined")

âœ… K-Fold function defined


In [8]:
K_FOLDS = 10

print("\n" + "="*70)
print(f"ðŸš€ STARTING {K_FOLDS}-FOLD CROSS-VALIDATION")
print("="*70)
print(f"Total pacientes etiquetados: {len(rids_with_label)}")
print(f"Pacientes por fold (test): ~{len(rids_with_label)//K_FOLDS}")
print(f"Pacientes por fold (train): ~{len(rids_with_label) - len(rids_with_label)//K_FOLDS}")
print(f"Total folds: {K_FOLDS}\n")

rng = np.random.default_rng(42)
rids_shuffled = rids_with_label.copy()
rng.shuffle(rids_shuffled)

kf = KFold(n_splits=K_FOLDS, shuffle=False)  # Ya hicimos shuffle manual
results = []

for fold_idx, (train_idx, test_idx) in enumerate(tqdm(kf.split(rids_shuffled), total=K_FOLDS, desc="K-Fold Progress")):
    test_rids = rids_shuffled[test_idx]
    
    fold_result = train_kfold(df.copy(), test_rids, n_epochs=100)
    fold_result['fold'] = fold_idx + 1
    results.append(fold_result)
    
    print(f"  Fold {fold_idx+1}/{K_FOLDS}: MAE={fold_result['test_mae']:.3f} years | "
          f"Test patients={fold_result['n_test']} | Train={fold_result['n_train']}")

print("\nâœ… K-Fold Cross-Validation completed!")


ðŸš€ STARTING 10-FOLD CROSS-VALIDATION
Total pacientes etiquetados: 82
Pacientes por fold (test): ~8
Pacientes por fold (train): ~74
Total folds: 10



K-Fold Progress:  10%|â–ˆ         | 1/10 [00:02<00:22,  2.48s/it]

  Fold 1/10: MAE=0.070 years | Test patients=9 | Train=73


K-Fold Progress:  20%|â–ˆâ–ˆ        | 2/10 [00:05<00:21,  2.66s/it]

  Fold 2/10: MAE=0.064 years | Test patients=9 | Train=73


K-Fold Progress:  30%|â–ˆâ–ˆâ–ˆ       | 3/10 [00:07<00:18,  2.60s/it]

  Fold 3/10: MAE=0.170 years | Test patients=8 | Train=74


K-Fold Progress:  40%|â–ˆâ–ˆâ–ˆâ–ˆ      | 4/10 [00:10<00:14,  2.46s/it]

  Fold 4/10: MAE=0.077 years | Test patients=8 | Train=74


K-Fold Progress:  50%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆ     | 5/10 [00:12<00:11,  2.33s/it]

  Fold 5/10: MAE=0.119 years | Test patients=8 | Train=74


K-Fold Progress:  60%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ    | 6/10 [00:14<00:09,  2.36s/it]

  Fold 6/10: MAE=0.052 years | Test patients=8 | Train=74


K-Fold Progress:  70%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ   | 7/10 [00:17<00:07,  2.39s/it]

  Fold 7/10: MAE=0.039 years | Test patients=8 | Train=74


K-Fold Progress:  80%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ  | 8/10 [00:19<00:04,  2.35s/it]

  Fold 8/10: MAE=0.117 years | Test patients=8 | Train=74


K-Fold Progress:  90%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ | 9/10 [00:21<00:02,  2.32s/it]

  Fold 9/10: MAE=0.087 years | Test patients=8 | Train=74


K-Fold Progress: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 10/10 [00:23<00:00,  2.37s/it]

  Fold 10/10: MAE=0.037 years | Test patients=8 | Train=74

âœ… K-Fold Cross-Validation completed!





In [9]:
df_results = pd.DataFrame(results)
df_results_clean = df_results.dropna(subset=['test_mae', 'test_rmse'])

mae_mean = df_results_clean['test_mae'].mean()
mae_std = df_results_clean['test_mae'].std()
mae_median = df_results_clean['test_mae'].median()
mae_min = df_results_clean['test_mae'].min()
mae_max = df_results_clean['test_mae'].max()

rmse_mean = df_results_clean['test_rmse'].mean()
rmse_std = df_results_clean['test_rmse'].std()
rmse_median = df_results_clean['test_rmse'].median()

from scipy import stats
n_folds = len(df_results_clean)
confidence = 0.95
t_crit = stats.t.ppf((1 + confidence) / 2, n_folds - 1)
mae_ci = t_crit * mae_std / np.sqrt(n_folds)
rmse_ci = t_crit * rmse_std / np.sqrt(n_folds)

print("\n" + "="*70)
print(f"ðŸ“Š {K_FOLDS}-FOLD CROSS-VALIDATION RESULTS")
print("="*70)
print(f"Valid folds: {n_folds}/{K_FOLDS}")
print(f"\nMAE (Mean Absolute Error):")
print(f"  Mean Â± 95% CI: {mae_mean:.3f} Â± {mae_ci:.3f} years")
print(f"  Median: {mae_median:.3f} years")
print(f"  Std Dev: {mae_std:.3f} years")
print(f"  Range: [{mae_min:.3f}, {mae_max:.3f}] years")
print(f"\nRMSE (Root Mean Squared Error):")
print(f"  Mean Â± 95% CI: {rmse_mean:.3f} Â± {rmse_ci:.3f} years")
print(f"  Median: {rmse_median:.3f} years")
print(f"  Std Dev: {rmse_std:.3f} years")
print("="*70)

mae_days = mae_mean * 365
mae_ci_days = mae_ci * 365
print(f"\nðŸ“… En dÃ­as: MAE = {mae_days:.0f} Â± {mae_ci_days:.0f} dÃ­as (95% CI)")

kfold_summary = {
    'model': f'All Biomarkers - {K_FOLDS}-Fold CV',
    'n_folds': int(n_folds),
    'n_patients': int(len(rids_with_label)),
    'mae_mean': float(mae_mean),
    'mae_ci': float(mae_ci),
    'mae_median': float(mae_median),
    'mae_std': float(mae_std),
    'mae_min': float(mae_min),
    'mae_max': float(mae_max),
    'rmse_mean': float(rmse_mean),
    'rmse_ci': float(rmse_ci),
    'rmse_median': float(rmse_median),
    'rmse_std': float(rmse_std)
}

with open('kfold_cv_summary.json', 'w') as f:
    json.dump(kfold_summary, f, indent=2)

df_results.to_csv('kfold_cv_detailed_results.csv', index=False)

print("\nâœ… Resultados guardados:")
print("  - kfold_cv_summary.json")
print("  - kfold_cv_detailed_results.csv")


ðŸ“Š 10-FOLD CROSS-VALIDATION RESULTS
Valid folds: 10/10

MAE (Mean Absolute Error):
  Mean Â± 95% CI: 0.083 Â± 0.030 years
  Median: 0.073 years
  Std Dev: 0.042 years
  Range: [0.037, 0.170] years

RMSE (Root Mean Squared Error):
  Mean Â± 95% CI: 0.132 Â± 0.065 years
  Median: 0.100 years
  Std Dev: 0.090 years

ðŸ“… En dÃ­as: MAE = 30 Â± 11 dÃ­as (95% CI)

âœ… Resultados guardados:
  - kfold_cv_summary.json
  - kfold_cv_detailed_results.csv


## Conclusiones

### Ventajas de 10-Fold CV sobre LOPO-CV

1. **Test mÃ¡s robusto por fold:** 8 pacientes vs 1 paciente
2. **MÃ¡s rÃ¡pido:** 10 entrenamientos vs 82 entrenamientos
3. **Mejor balance:** Train/Test ratio mÃ¡s realista
4. **Intervalos de confianza vÃ¡lidos:** IC 95% con 10 folds
5. **EvaluaciÃ³n completa:** Todos los 82 pacientes son test exactamente 1 vez

### ComparaciÃ³n

| MÃ©todo | Test/Fold | Tiempo | Robustez |
|--------|-----------|--------|----------|
| Split Simple | 13 pac | 2 min | Baja |
| **10-Fold CV** | **8 pac** | **5 min** | **Alta** |
| LOPO-CV | 1 pac | 2h | Media |

### Referencias

- Kohavi (1995). "A study of cross-validation and bootstrap for accuracy estimation and model selection."
- Varoquaux et al. (2017). "Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines."