<a href="https://colab.research.google.com/github/UN-GCPDS/Curso-Corto-LLMs/blob/main/3.%20Dashboard/Entrenamiento.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Logo UNAL CHEC](https://github.com/UN-GCPDS/curso_IA_CHEC/blob/main/logo_unal_chec.jpg?raw=1)

# **Entrenamiento modelo Tabnet**

## **Descripción**

Entrenamiento de modelo Tabnet bajo diversas condiciones.

### **Profesor - Sesión 1:** Andrés Marino Álvarez Meza y Diego Armando Pérez Rosero

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Datos

**TabNet para criticidad en redes de media tensión — Planteamiento y datos (Regresión)**

Sea el conjunto de datos

$$
\mathbf{X}\in\mathbb{R}^{N\times M},\qquad
\mathbf{y}\in\mathbb{R}^{N}.
$$

Cada fila de $\mathbf{X}$ representa un evento o periodo entre 2019 y 2024 y contiene las características de los elementos asociados al equipo que operó. El vector $\mathbf{y}$ almacena el valor continuo del indicador a modelar (SAIDI o SAIFI) para ese mismo evento/periodo.

Definimos

$$
\mathcal{F}:\mathcal{X}\subseteq\mathbb{R}^{M}\to\mathbb{R},\qquad
\hat{y}=\mathcal{F}(\mathbf{x})
=
\bigl(\,\breve{f}_{L}\circ \breve{f}_{L-1}\circ \cdots \circ \breve{f}_{1}\,\bigr)(\mathbf{x}),
$$

donde $\breve{f}_{l}(\cdot)$ denota el $l$-ésimo bloque del modelo ($l\in\{1,\dots,L\}$) y $\circ$ es el operador de composición.

En caso multisalida para $(\text{SAIDI},\text{SAIFI})$, se toma $\mathcal{F}:\mathbb{R}^{M}\to\mathbb{R}^{2}$ y $\mathbf{y}\in\mathbb{R}^{N\times 2}$.
![Logo UNAL CHEC](https://raw.githubusercontent.com/Daprosero/Deep-Convolutional-Generative-Adversarial-Network/refs/heads/master/Mercados%20CHEC.png)

In [None]:
#@title Librerías
# Instalación de paquetes necesarios
!pip install -q gdown
!pip install openTSNE
!pip install pytorch-tabnet optuna
!pip install wget --quiet

# Importación de librerías necesarias
import optuna
import warnings
import torch
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.special import softmax
from sklearn.preprocessing import LabelEncoder, MinMaxScaler, StandardScaler, QuantileTransformer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score
from sklearn.neighbors import NearestNeighbors
from pytorch_tabnet.tab_model import TabNetRegressor, TabNetClassifier
from pytorch_tabnet.augmentations import RegressionSMOTE
from google.colab import drive
import tensorflow as tf
import tensorflow_probability as tfp
import os
from pathlib import Path
import math
import wget
!gdown --id 1o_fZIhk6ErrtrM3eVZPF9s2qj8l4FoqS -O SuperEventos_Criticidad_AguasAbajo_CODEs.zip
!gdown --id 1lBrseLoEmr6-VwNSCHOp2zuc4sKKrkbQ -O model.zip
!gdown --id 16VIuHLgPGpX4J723Wd48UAPhHivLuUaH -O Data_CHEC.zip

import zipfile
import os

zip_path = "SuperEventos_Criticidad_AguasAbajo_CODEs.zip"
extract_dir = "CHEC"

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

zip_path = "model.zip"

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

zip_path = "Data_CHEC.zip"

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

# Supresión de warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="pandas")
warnings.filterwarnings("ignore", category=FutureWarning)

# Función auxiliar para etiquetas
def get_labels(x: pd.Series) -> pd.Series:
    labels, _ = pd.factorize(x)
    return pd.Series(labels, name=x.name, index=x.index)

# Definición de funciones personalizadas de pérdida
def my_mse_loss_fn(y_pred, y_true):
    mse_loss = (y_true - y_pred) ** 2
    return torch.mean(mse_loss)
import re
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def plot_var_band(
    df,
    var_token,
    row_index=0,
    hours_back=24,
    col_patterns=None,
    display_name=None,
    units=None,
    event_label="evento reportado",
):
    """
    Grafica una variable climática en una franja de horas hacia atrás.

    Parámetros
    ----------
    df : pd.DataFrame
        Contiene columnas por hora para la variable elegida.
        Ejemplos de nombres soportados automáticamente:
        - 'h0-<var>', 'h1-<var>', ..., 'h24-<var>'
        - '<var>_h0', '<var>_h1', ...
        con separadores '_' o '-'.

    var_token : str
        Nombre base de la variable en los nombres de columna (p.ej. 'wind_gust_spd',
        'air_temp', 'precip'). Debe coincidir con lo que aparece en las columnas.

    row_index : int
        Fila (evento) a graficar.

    hours_back : int
        Cuántas horas hacia atrás mostrar.

    col_patterns : list[str] | None
        Lista de regex opcionales para detectar columnas por hora.
        Si None, se generan automáticamente a partir de var_token.

    display_name : str | None
        Etiqueta legible para el eje Y (p.ej. 'Ráfaga de viento').
        Si None, se usa var_token.

    units : str | None
        Unidades para concatenar en la etiqueta Y (p.ej. 'm/s', '°C', 'mm').

    event_label : str
        Texto para la flecha en la hora 0.
    """
    # --- 1) Preparar patrones de columnas ---
    if col_patterns is None:
        # Permitir '_' o '-' (o espacio) entre partes del var_token
        parts = re.split(r'[_\-\s]+', var_token.strip())
        # Construimos un regex que tolere '_' o '-' entre partes
        # ej: 'wind[_-]?gust[_-]?spd'
        var_regex = r'[_-]?'.join(map(re.escape, parts))

        col_patterns = [
            rf'^h(\d{{1,2}})[-_]?{var_regex}$',   # h0-<var>  o  h0_<var>
            rf'^{var_regex}[-_]?h(\d{{1,2}})$',   # <var>-h0  o  <var>_h0
        ]

    # --- 2) Detectar columnas y mapear a hora ---
    hour_to_col = {}
    for c in df.columns:
        for pat in col_patterns:
            m = re.match(pat, str(c), flags=re.IGNORECASE)
            if m:
                h = int(m.group(1))
                hour_to_col[h] = c
                break

    if not hour_to_col:
        raise ValueError(
            f"No se encontraron columnas con horas para la variable '{var_token}'.\n"
            f"Prueba ajustando 'var_token' o pasando 'col_patterns' personalizados."
        )

    # --- 3) Construir serie horas [0..hours_back] si existen, orden ascendente ---
    hours = [h for h in sorted(hour_to_col.keys()) if 0 <= h <= hours_back]
    vals = np.array(
        [pd.to_numeric(df.loc[df.index[row_index], hour_to_col[h]], errors='coerce') for h in hours],
        dtype=float
    )

    # --- 4) Graficar ---
    fig, ax = plt.subplots(figsize=(10, 5))

    # línea y puntos
    ax.plot(hours, vals, marker='o')

    # invertir eje X para que se vea 24 -> 0
    ax.set_xlim(hours_back, 0)

    # franja sombreada
    ymin = np.nanmin(vals) if np.isfinite(np.nanmin(vals)) else 0.0
    ymax = np.nanmax(vals) if np.isfinite(np.nanmax(vals)) else 1.0
    pad  = 0.05 * (ymax - ymin if ymax > ymin else 1.0)
    ax.set_ylim(ymin - pad, ymax + pad)
    ax.axvspan(0, hours_back, alpha=0.15)

    # flecha y etiqueta en hora 0
    y0 = vals[hours.index(0)] if 0 in hours else np.nan
    if not np.isfinite(y0):
        y0 = np.nanmedian(vals) if np.isfinite(np.nanmedian(vals)) else (ymin + ymax) / 2.0

    ax.annotate(
        event_label,
        xy=(0, y0),
        xytext=(max(2, min(4, hours_back*0.15)), y0 + (ymax - y0)*0.15),
        arrowprops=dict(arrowstyle="->", lw=1),
        ha='left', va='bottom'
    )

    # etiquetas
    ylab = display_name if display_name else var_token
    if units:
        ylab = f"{ylab} [{units}]"
    ax.set_xlabel("Horas antes del evento")
    ax.set_ylabel(ylab)
    ax.grid(True, alpha=0.3)

    # ticks principales (24, 18, 12, 6, 0) si corresponde
    xticks = [h for h in [hours_back, 18, 12, 6, 0] if 0 <= h <= hours_back]
    ax.set_xticks(xticks)

    plt.tight_layout()
    plt.show()


# --- Ejemplo de uso:
# plot_wind_gust_band(df=tu_dataframe, row_index=0, hours_back=24)

def my_rmse_loss_fn(y_pred, y_true):
    mse_loss = (y_true - y_pred) ** 2
    mean_mse_loss = torch.mean(mse_loss)
    rmse_loss = torch.sqrt(mean_mse_loss)
    return rmse_loss

def my_mae_loss_fn(y_pred, y_true):
    mae_loss = torch.abs(y_true - y_pred)
    return torch.mean(mae_loss)

def my_mape_loss_fn(y_pred, y_true):
    mape_loss = torch.abs((y_true - y_pred) / y_true) * 100
    return torch.mean(mape_loss)

def my_r2_score_fn(y_pred, y_true):
    total_variance = torch.var(y_true, unbiased=False)
    unexplained_variance = torch.mean((y_true - y_pred) ** 2)
    r2_score = 1 - (unexplained_variance / total_variance)
    return 1-r2_score

# Etapa 0: imports
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split
# ==== Librerías ====
import numpy as np
import cupy as cp
import xgboost as xgb

from cuml.ensemble import RandomForestRegressor as cuRF
from cuml.metrics import r2_score as r2_gpu

# Si quieres comparar con CPU para sanity-check:
from sklearn.metrics import r2_score as r2_cpu
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt

from sklearn.metrics import r2_score
# ==== Utilidades ====
def to_cpu(a):
    """Convierte CuPy -> NumPy si aplica."""
    try:
        if isinstance(a, cp.ndarray):
            return cp.asnumpy(a)
    except Exception:
        pass
    return a

def metrics_gpu(y_true_cp, y_pred_cp):
    """MAE, RMSE, R2 calculados en GPU (CuPy)."""
    y_true_cp = cp.asarray(y_true_cp)
    y_pred_cp = cp.asarray(y_pred_cp)
    mae  = float(cp.mean(cp.abs(y_true_cp - y_pred_cp)))
    rmse = float(cp.sqrt(cp.mean((y_true_cp - y_pred_cp)**2)))
    ssr  = float(cp.sum((y_true_cp - y_pred_cp)**2))
    sst  = float(cp.sum((y_true_cp - cp.mean(y_true_cp))**2))
    r2   = 1.0 - ssr / sst if sst > 0 else np.nan
    return mae, rmse, r2

def permutation_importance_rf_gpu(model, X_val_cp, y_val_cp, n_repeats=3, max_feats=None, random_state=42):
    """
    Permutation importance en GPU para RF cuML.
    Devuelve importancia por feature (drop medio de R2 en valid).
    Si max_feats no es None, calcula solo para las primeras max_feats columnas (para acelerar).
    """
    rs = cp.random.RandomState(random_state)
    X_val_cp = cp.asarray(X_val_cp)
    y_val_cp = cp.asarray(y_val_cp)

    # R2 base
    y_pred_base = model.predict(X_val_cp)
    _, _, r2_base = metrics_gpu(y_val_cp, y_pred_base)

    n, d = X_val_cp.shape
    d_eval = d if max_feats is None else int(min(max_feats, d))
    importances = cp.zeros(d, dtype=cp.float32)

    for j in range(d_eval):
        drops = []
        for _ in range(n_repeats):
            Xp = X_val_cp.copy()
            idx = rs.permutation(n)
            Xp[:, j] = Xp[idx, j]  # permutar solo la columna j
            y_pred_p = model.predict(Xp)
            _, _, r2_p = metrics_gpu(y_val_cp, y_pred_p)
            drops.append(r2_base - r2_p)
        importances[j] = cp.mean(cp.asarray(drops))

    return importances  # CuPy array
def regression_metrics(y_true, y_pred):
    mae  = float(np.mean(np.abs(y_true - y_pred)))
    rmse = float(np.sqrt(np.mean((y_true - y_pred)**2)))
    ss_res = float(np.sum((y_true - y_pred)**2))
    ss_tot = float(np.sum((y_true - np.mean(y_true))**2))
    r2 = 1 - ss_res/ss_tot if ss_tot > 0 else np.nan
    return mae, rmse, r2
class CustomTabNetRegressor(TabNetRegressor):
    def __init__(self, *args, **kwargs):
        super(CustomTabNetRegressor, self).__init__(*args, **kwargs)

    def forward(self, X):
        output, M_loss = self.network(X)
        output = torch.relu(output)
        return output, M_loss

    def predict(self, X):
        device = next(self.network.parameters()).device
        if not isinstance(X, torch.Tensor):
            X = torch.tensor(X, dtype=torch.float32)
        X = X.to(device)
        with torch.no_grad():
            output, _ = self.forward(X)
        return output.cpu().numpy()
import numpy as np
import pandas as pd
from scipy.spatial import cKDTree
from tqdm import tqdm
from ast import literal_eval
from pandas.api.types import is_numeric_dtype


import numpy as np
from collections import Counter
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
def make_strat_labels(y_vals, n_bins=3, min_per_class=2):
    """
    Genera etiquetas para estratificar a partir de un objetivo continuo.
    Reduce bins si no hay suficientes muestras por clase.
    """
    y1d = y_vals.reshape(-1)
    for bins in range(n_bins, 1, -1):
        pct = np.linspace(0, 100, bins + 1)[1:-1]
        cuts = np.percentile(y1d, pct)
        if np.any(np.diff(cuts) <= 0):
            continue
        labels = np.digitize(y1d, bins=cuts).astype(int)
        counts = Counter(labels)
        if all(c >= min_per_class for c in counts.values()) and len(counts) > 1:
            return labels
    return None

def stratify_from_df_or_y(df_labels, idx, y_subset, col='NIVEL_C'):
    """Intenta usar df[col] como etiqueta; si falla, usa percentiles en y_subset."""
    try:
        ycat_full = df_labels.loc[:, col].values.astype(int)
        ycat = ycat_full[idx]
        c10 = Counter(ycat)
        if all(v >= 2 for v in c10.values()) and len(c10) > 1:
            return ycat
    except Exception:
        pass
    return make_strat_labels(y_subset[:,0], n_bins=3, min_per_class=2)

def split_subset(X, y, df_labels=None, n_sub=1000, test_size=0.20, seed=42):
    """
    1) Toma un subset aleatorio de tamaño n_sub.
    2) Escala y (MinMax) sobre el subset.
    3) Split train/test con estratificación si es viable.
    4) Split train/valid (20% del train), con re-estratificación si es posible.
    """
    rng = np.random.RandomState(seed)
    n_total = X.shape[0]
    n_sub = min(n_sub, n_total)
    idx_sub = rng.choice(n_total, size=n_sub, replace=False)

    X_sub = X[idx_sub]
    y_sub = y[idx_sub]
    # etiquetas auxiliares para estratificación
    ycat_sub = stratify_from_df_or_y(df_labels, idx_sub, y_sub) if df_labels is not None else make_strat_labels(y_sub[:,0])
    # escalar objetivo en el subset
    scaler = MinMaxScaler()
    y_sub_scaled = scaler.fit_transform(y_sub)

    split_kwargs = dict(test_size=test_size, random_state=seed, shuffle=True)
    if ycat_sub is not None:
        X_tr, X_te, y_tr, y_te, ycat_tr, ycat_te = train_test_split(
            X_sub, y_sub_scaled, ycat_sub, stratify=ycat_sub, **split_kwargs
        )
    else:
        X_tr, X_te, y_tr, y_te = train_test_split(X_sub, y_sub_scaled, **split_kwargs)
        ycat_tr = ycat_te = None

    # Validación (20% del train)
    if ycat_tr is not None:
        y_tr_raw = y_tr[:,0]
        ycat_t = make_strat_labels(y_tr_raw, n_bins=3, min_per_class=2)
        if ycat_t is not None:
            X_tr, X_va, y_tr, y_va, ycat_tr, ycat_va = train_test_split(
                X_tr, y_tr, ycat_tr, test_size=0.20, random_state=seed, stratify=ycat_t
            )
        else:
            X_tr, X_va, y_tr, y_va = train_test_split(
                X_tr, y_tr, test_size=0.20, random_state=seed, shuffle=True
            )
            ycat_va = None
    else:
        X_tr, X_va, y_tr, y_va = train_test_split(
            X_tr, y_tr, test_size=0.20, random_state=seed, shuffle=True
        )
        ycat_va = None

    # Reporte rápido
    print("Originales (conservados):", X_orig.shape, y_orig.shape)
    print(f"Subset de {n_sub}:", X_sub.shape, y_sub.shape)
    print("Train/Valid/Test:", X_tr.shape, X_va.shape, X_te.shape)
    if ycat_sub is not None:
        print("Distribución clases subset:", Counter(ycat_sub))

    return {
        "idx_sub": idx_sub,
        "X_train": X_tr, "X_valid": X_va, "X_test": X_te,
        "y_train": y_tr, "y_valid": y_va, "y_test": y_te
    }
from copy import deepcopy
from sklearn.metrics import r2_score

def make_tabnet(cat_info, params):
    cat_idxs = [i for i, f in enumerate(features) if f in CATEGORICAL_COLUMNS]
    cat_dims = [categorical_dims[f] for f in features if f in CATEGORICAL_COLUMNS]
    cat_emb_dim = [min(params['emb'], max(4, (dim + 1)//2)) for dim in cat_dims]
    return cat_idxs, cat_dims, cat_emb_dim

def build_optimizer(optimizer_type, learning_rate, momentum, weight_decay):
    if optimizer_type == 'adam':
        optimizer_fn = torch.optim.Adam
        optimizer_params = {'lr': learning_rate, 'weight_decay': weight_decay}
    elif optimizer_type == 'adamw':
        optimizer_fn = torch.optim.AdamW
        optimizer_params = {'lr': learning_rate, 'weight_decay': weight_decay}
    elif optimizer_type == 'sgd':
        optimizer_fn = torch.optim.SGD
        optimizer_params = {'lr': learning_rate, 'momentum': momentum, 'weight_decay': weight_decay}
    elif optimizer_type == 'rmsprop':
        optimizer_fn = torch.optim.RMSprop
        optimizer_params = {'lr': learning_rate, 'momentum': momentum, 'weight_decay': weight_decay}
    return optimizer_fn, optimizer_params
def objective_regression(trial):
    # Capacidad TabNet
    n_d     = trial.suggest_int('n_d', 2, 256)
    n_a     = trial.suggest_int('n_a', 2, 256)
    n_steps = trial.suggest_int('n_steps', 1, 10)

    gamma         = trial.suggest_float('gamma', 1e-12, 1e2)
    lambda_sparse = trial.suggest_float('lambda_sparse', 1e-12, 1e2, log=True)

    batch_size  = trial.suggest_categorical('batch_size', [1024, 2048, 4096])
    mask_type   = trial.suggest_categorical('mask_type', ['entmax', 'sparsemax'])
    emb         = trial.suggest_int('emb', 3, 70)

    momentum      = trial.suggest_float('momentum', 0.001, 0.9)
    learning_rate = trial.suggest_float('learning_rate', 1e-7, 963e-1, log=True)
    weight_decay  = trial.suggest_float('weight_decay', 1e-6, 1e-3, log=True)

    scheduler_gamma = trial.suggest_float('scheduler_gamma', 0.1, 0.99)
    step_size       = trial.suggest_int('step_size',  1, 20)

    virtual_batch_size = trial.suggest_categorical('virtual_batch_size', [256,512,1024])
    if isinstance(batch_size, int) and isinstance(virtual_batch_size, int) and virtual_batch_size > batch_size:
        virtual_batch_size = batch_size // 2 if batch_size >= 64 else batch_size

    optimizer_type = trial.suggest_categorical('optimizer_type', ['adam', 'adamw', 'sgd', 'rmsprop'])
    optimizer_fn, optimizer_params = build_optimizer(optimizer_type, learning_rate, momentum, weight_decay)

    p   = trial.suggest_float('p', 1e-6, 0.99)
    aug = RegressionSMOTE(p=p)

    cat_idxs = [i for i, f in enumerate(features) if f in CATEGORICAL_COLUMNS]
    cat_dims = [categorical_dims[f] for f in features if f in CATEGORICAL_COLUMNS]
    cat_emb_dim = [min(emb, max(4, (dim + 1)//2)) for dim in cat_dims]

    model = CustomTabNetRegressor(
        cat_dims=cat_dims, cat_emb_dim=cat_emb_dim, cat_idxs=cat_idxs,
        n_d=n_d, n_a=n_a, n_steps=n_steps, gamma=gamma, lambda_sparse=lambda_sparse,
        mask_type=mask_type, optimizer_fn=optimizer_fn, optimizer_params=optimizer_params,
        scheduler_params={"gamma": scheduler_gamma, "step_size": step_size},
        scheduler_fn=torch.optim.lr_scheduler.StepLR, verbose=False
    )
    model.fit(
        X_train=X_train, y_train=y_train,
        eval_set=[(X_train, y_train), (X_valid, y_valid)],
        eval_name=['train', 'valid'],
        eval_metric=['mae'],
        loss_fn=my_r2_score_fn,  # (conserva tu lógica)
        max_epochs=70, patience=40,
        batch_size=batch_size, virtual_batch_size=virtual_batch_size,
        num_workers=1, drop_last=False, augmentations=aug,
    )
    mae = model.history['loss'][-1]
    return mae

def eval_and_print(title, clf_model, X_test, y_test):
    """Evalúa R² en escala original (inverse_transform) y lo imprime."""
    y_pred = clf_model.predict(X_test)
    r2 = r2_score(y_test, y_pred)
    print(f"{title}: R2={r2:.4f}")
    return r2

def run_three_training_strategies(
    # modelos / kwargs
    clf_base,                   # modelo ya entrenado en la Fase 1 (con warm_start=True)
    model_init_kwargs,          # dict con los kwargs para construir un modelo nuevo idéntico (desde cero)
    # datos antiguos (Fase 1)
    X_train_old, y_train_old,   # típicamente (X_train, y_train[:,0:1]) de los 1000
    X_test_old, y_test_old,  # test y scaler usados en la Fase 1
    # datos nuevos (Fase 2)
    X_tr_new, y_tr_new,         # train de los 500
    X_va_new, y_va_new,         # valid de los 500 (para early stopping)
    X_te_new, y_te_new,  # test nuevo y su scaler
    # entrenamiento
    batch_size, virtual_batch_size, aug,
    max_epochs_ft_inc=200, patience_ft_inc=70,
    max_epochs_ft_new=200, patience_ft_new=70,
    max_epochs_scratch=200, patience_scratch=70,
    lower_lr_factor=0.1, min_lr=1e-5
):
    """
    Ejecuta:
      A) Fine-tuning incremental (old + new)
      B) Fine-tuning no incremental (solo new)
      C) Desde cero (old + new)
    y evalúa R² en test viejo y test nuevo (ambos en escala original).
    Devuelve un dict con los R².
    """
    results = {}

    # =============================
    # A) Fine-tuning incremental
    # =============================
    clf_ft_inc = deepcopy(clf_base)  # copia del clf ya entrenado
    # bajar LR para fine-tune (opcional, recomendado)
    if hasattr(clf_ft_inc, "_optimizer"):
        for g in clf_ft_inc._optimizer.param_groups:
            g["lr"] = max(g["lr"] * lower_lr_factor, min_lr)

    X_inc = np.concatenate([X_train_old, X_tr_new], axis=0)
    y_inc = np.concatenate([y_train_old, y_tr_new], axis=0)

    clf_ft_inc.fit(
        X_train=X_inc, y_train=y_inc,
        eval_set=[(X_inc, y_inc), (X_va_new, y_va_new)],
        eval_name=['train_inc', 'valid_new'],
        eval_metric=['mae'], loss_fn=my_r2_score_fn,
        max_epochs=max_epochs_ft_inc, patience=patience_ft_inc,
        batch_size=batch_size, virtual_batch_size=virtual_batch_size,
        num_workers=1, drop_last=False, augmentations=aug,
    )

    print("\n== Desempeño: Fine-tuning incremental ==")
    r2_old_inc = eval_and_print("Test viejo (FT incremental)", clf_ft_inc, X_test_old, y_test_old)
    r2_new_inc = eval_and_print("Test nuevo (FT incremental)", clf_ft_inc, X_te_new,  y_te_new)
    results["fine_tune_incremental"] = {"R2_old_test": r2_old_inc, "R2_new_test": r2_new_inc, "model": clf_ft_inc}

    # =============================
    # B) Fine-tuning no incremental (solo nuevos)
    # =============================
    clf_ft_new = deepcopy(clf_base)
    if hasattr(clf_ft_new, "_optimizer"):
        for g in clf_ft_new._optimizer.param_groups:
            g["lr"] = max(g["lr"] * lower_lr_factor, min_lr)

    clf_ft_new.fit(
        X_train=X_tr_new, y_train=y_tr_new,
        eval_set=[(X_tr_new, y_tr_new), (X_va_new, y_va_new)],
        eval_name=['train_new', 'valid_new'],
        eval_metric=['mae'], loss_fn=my_r2_score_fn,
        max_epochs=max_epochs_ft_new, patience=patience_ft_new,
        batch_size=batch_size, virtual_batch_size=virtual_batch_size,
        num_workers=1, drop_last=False, augmentations=aug,
    )

    print("\n== Desempeño: Fine-tuning NO incremental (solo nuevos) ==")
    r2_old_new = eval_and_print("Test viejo (FT no incremental)", clf_ft_new, X_test_old, y_test_old)
    r2_new_new = eval_and_print("Test nuevo (FT no incremental)", clf_ft_new, X_te_new,  y_te_new)
    results["fine_tune_only_new"] = {"R2_old_test": r2_old_new, "R2_new_test": r2_new_new, "model": clf_ft_new}

    # =============================
    # C) Desde cero (cumulative old+new)
    # =============================
    # model_init_kwargs debe contener todo lo necesario para reconstruir el TabNet
    clf_scratch = CustomTabNetRegressor(**model_init_kwargs)

    X_cum = np.concatenate([X_train_old, X_tr_new], axis=0)
    y_cum = np.concatenate([y_train_old, y_tr_new], axis=0)

    clf_scratch.fit(
        X_train=X_cum, y_train=y_cum,
        eval_set=[(X_cum, y_cum), (X_va_new, y_va_new)],
        eval_name=['train_cum', 'valid_new'],
        eval_metric=['mae'], loss_fn=my_r2_score_fn,
        max_epochs=max_epochs_scratch, patience=patience_scratch,
        batch_size=batch_size, virtual_batch_size=virtual_batch_size,
        num_workers=1, drop_last=False, augmentations=aug,
    )

    print("\n== Desempeño: Desde cero (old+new) ==")
    r2_old_sc = eval_and_print("Test viejo (desde cero)", clf_scratch, X_test_old, y_test_old)
    r2_new_sc = eval_and_print("Test nuevo (desde cero)", clf_scratch, X_te_new,  y_te_new)
    results["from_scratch"] = {"R2_old_test": r2_old_sc, "R2_new_test": r2_new_sc, "model": clf_scratch}
    return results
def pick_new_indices(n_new=500, seed=123):
    rng = np.random.RandomState(seed)
    universe = np.setdiff1d(np.arange(X.shape[0]), splits_1000["idx_sub"], assume_unique=True)
    n_new = min(n_new, universe.shape[0])
    return rng.choice(universe, size=n_new, replace=False)

Collecting openTSNE
  Downloading openTSNE-1.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.8 kB)
Downloading openTSNE-1.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openTSNE
Successfully installed openTSNE-1.0.2
Collecting pytorch-tabnet
  Downloading pytorch_tabnet-4.1.0-py3-none-any.whl.metadata (15 kB)
Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading pytorch_tabnet-4.1.0-py3-none-any.whl (44 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.5/44.5 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading optuna-4.5.0-py3-none-any.whl (400 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 k

In [None]:
Xdata = df = pd.read_pickle('/content/drive/MyDrive/Colab Notebooks/Diego/SuperEventos_Criticidad_AguasAbajo_CODEs.pkl')
Xdata = Xdata[Xdata['duracion_h'] <= 100]
# ---------------------------------------------------------
# Etapa 1: seleccionar objetivo (SAIDI o SAIFI) con forma (N,1)
# Extraer variables objetivo
Dur_h = Xdata['duracion_h'].values
SAIDI = Xdata['SAIDI'].values
df1=Xdata.copy()
# Eliminar columnas no utilizadas
Xdata.drop(['inicio_evento', 'h0-solar_rad', 'h0-uv', 'h1-solar_rad', 'h1-uv', 'h2-solar_rad', 'h2-uv', 'h3-solar_rad', 'h3-uv',
            'h4-solar_rad', 'h4-uv', 'h5-solar_rad', 'h5-uv', 'h19-solar_rad', 'h19-uv', 'h20-solar_rad', 'h20-uv',
            'h21-solar_rad', 'h21-uv', 'h22-solar_rad', 'h22-uv', 'h23-solar_rad', 'h23-uv', 'evento', 'fin', 'inicio',
            'cnt_usus', 'DEP', 'MUN', 'FECHA', 'NIVEL_C', 'VALOR_C', 'TRAMOS_AGUAS_ABAJO', 'EQUIPOS_PUNTOS',
            'PUNTOS_POLIGONO', 'LONGITUD2', 'LATITUD2', 'FECHA_C','TRAMOS_AGUAS_ABAJO_CODES','ORDER_'],
           inplace=True, axis=1)

# Definir la variable objetivo y eliminarla del conjunto de características
target = ['SAIFI', 'SAIDI', 'duracion_h']
y1 = Xdata[target].values
Xdata.drop(target, axis=1, inplace=True)
y = y1[:, 0:1].astype('float32')

# Copia de trabajo de X
df = Xdata.copy()

# ---------------------------------------------------------
# Etapa 2: tipificar columnas
NUMERIC_COLUMNS = df.select_dtypes(include=['number']).columns.tolist()
CATEGORICAL_COLUMNS = df.select_dtypes(include=['object', 'category']).columns.tolist()

# ---------------------------------------------------------
# Etapa 3: imputación numérica
max_values = {}
for col in NUMERIC_COLUMNS:
    max_value = pd.to_numeric(df[col], errors='coerce').max()
    if pd.isna(max_value):
        max_value = 0.0
    max_values[col] = max_value
    df[col] = pd.to_numeric(df[col], errors='coerce').fillna(-10.0 * max_value)

# ---------------------------------------------------------
# Etapa 4: codificación categórica
label_encoders = {}
categorical_dims = {}
for col in CATEGORICAL_COLUMNS:
    enc = LabelEncoder()
    s = df[col].astype(str).fillna("no aplica")
    enc.fit(s)
    df[col] = enc.transform(s)
    label_encoders[col] = enc
    categorical_dims[col] = len(enc.classes_)

# ---------------------------------------------------------
# Etapa 5: construir matrices X, y
unused_feat = []
# Si Xdata NO incluye el target, basta con tomar todas las columnas
features = [c for c in df.columns if c not in unused_feat]
X = df[features].values.astype('float32')
# Etapa 6: clases auxiliares para estratificación
try:
    # usar etiqueta externa si existe
    y_categorized = df1['NIVEL_C'].values.astype(int)
except Exception:
    # fallback: terciles del objetivo
    percentiles = np.percentile(y[:, 0], [33.33, 66.66])
    y_categorized = np.digitize(y[:, 0].flatten(), bins=percentiles).astype(int)

# ---------------------------------------------------------
# Etapa 7: escalar objetivo (regresión)
scaler = MinMaxScaler()
y_scaled = scaler.fit_transform(y)

# ---------------------------------------------------------
# Etapa 8: split train/test estratificado
X_train, X_test, y_train, y_test, ycat_train, ycat_test = train_test_split(
    X, y_scaled, y_categorized, test_size=0.20, random_state=42, stratify=y_categorized
)

# Etapa 8b: split train/valid estratificado por percentiles de y_train
percentiles_t = np.percentile(y_train[:, 0], [25, 50, 75])
y_categorized_t = np.digitize(y_train[:, 0].flatten(), bins=percentiles_t).astype(int)

X_train, X_valid, y_train, y_valid, ycat_train, ycat_valid = train_test_split(
    X_train, y_train, ycat_train, test_size=0.20, random_state=42, stratify=y_categorized_t
)

# Comprobaciones rápidas
print(X.shape, y.shape)
print("Train/Valid/Test:", X_train.shape, X_valid.shape, X_test.shape)

(166323, 326) (166323, 1)
Train/Valid/Test: (106446, 326) (26612, 326) (33265, 326)


![Logo UNAL CHEC](https://miro.medium.com/v2/resize:fit:1100/format:webp/0*lu62RCEko0VYe-YZ)



In [None]:
# import optuna
# study = optuna.create_study(direction='minimize', sampler=optuna.samplers.TPESampler())

# study.optimize(objective_regression, n_trials=15)


import optuna
from tqdm import tqdm

n_trials = 20  # o el número que quieras
study = optuna.create_study(direction='minimize', sampler=optuna.samplers.TPESampler())

# tqdm envolviendo el bucle de optimización
with tqdm(total=n_trials) as pbar:
    def callback(study, trial):
        pbar.update(1)  # avanza una barra por cada trial completado

    study.optimize(objective_regression, n_trials=n_trials, callbacks=[callback])

print("Best hyperparameters for regression: ", study.best_params)
print("Best mae: ", study.best_value)
par = study.best_params

[I 2025-09-11 16:46:45,023] A new study created in memory with name: no-name-b36100df-99c3-4c0f-a143-af05b801d053
  0%|          | 0/20 [00:00<?, ?it/s]

Stop training because you reached max_epochs = 70 with best_epoch = 58 and best_valid_mae = 0.61356


[I 2025-09-11 16:52:53,386] Trial 0 finished with value: 16837.685604709488 and parameters: {'n_d': 85, 'n_a': 121, 'n_steps': 3, 'gamma': 37.817878183857346, 'lambda_sparse': 2.6906000765077263e-09, 'batch_size': 4096, 'mask_type': 'sparsemax', 'emb': 37, 'momentum': 0.6630316518136158, 'learning_rate': 5.161876035583859e-05, 'weight_decay': 1.8061471155327178e-05, 'scheduler_gamma': 0.5155048083306591, 'step_size': 12, 'virtual_batch_size': 1024, 'optimizer_type': 'adam', 'p': 0.09355034491413258}. Best is trial 0 with value: 16837.685604709488.
  5%|▌         | 1/20 [06:08<1:56:38, 368.36s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 46 and best_valid_mae = 0.06251


[I 2025-09-11 16:57:59,350] Trial 1 finished with value: 397.5173647525249 and parameters: {'n_d': 2, 'n_a': 203, 'n_steps': 1, 'gamma': 24.108317895090796, 'lambda_sparse': 0.002045087321360193, 'batch_size': 4096, 'mask_type': 'sparsemax', 'emb': 54, 'momentum': 0.697491258595262, 'learning_rate': 2.869737037567972e-07, 'weight_decay': 1.7516285429780346e-05, 'scheduler_gamma': 0.45259952757561295, 'step_size': 3, 'virtual_batch_size': 256, 'optimizer_type': 'adamw', 'p': 0.4973689952383496}. Best is trial 1 with value: 397.5173647525249.
 10%|█         | 2/20 [11:14<1:39:29, 331.66s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 68 and best_valid_mae = 2.65379


[I 2025-09-11 17:06:32,434] Trial 2 finished with value: 8937463.792807037 and parameters: {'n_d': 11, 'n_a': 212, 'n_steps': 5, 'gamma': 55.101654019855125, 'lambda_sparse': 0.03227692680595638, 'batch_size': 2048, 'mask_type': 'sparsemax', 'emb': 43, 'momentum': 0.8392892907630071, 'learning_rate': 35.49033169860902, 'weight_decay': 5.020979031468037e-06, 'scheduler_gamma': 0.7402610293854881, 'step_size': 8, 'virtual_batch_size': 1024, 'optimizer_type': 'adam', 'p': 0.5865207668004241}. Best is trial 1 with value: 397.5173647525249.
 15%|█▌        | 3/20 [19:47<1:57:26, 414.50s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 55 and best_valid_mae = 0.04464


[I 2025-09-11 17:17:22,535] Trial 3 finished with value: 21.272145899146267 and parameters: {'n_d': 33, 'n_a': 213, 'n_steps': 9, 'gamma': 22.661937972843344, 'lambda_sparse': 1.259065807183134e-11, 'batch_size': 4096, 'mask_type': 'sparsemax', 'emb': 53, 'momentum': 0.15481533174484316, 'learning_rate': 19.340482478112012, 'weight_decay': 1.3944588429462397e-06, 'scheduler_gamma': 0.14703943661575894, 'step_size': 15, 'virtual_batch_size': 1024, 'optimizer_type': 'rmsprop', 'p': 0.8164508777869709}. Best is trial 3 with value: 21.272145899146267.
 20%|██        | 4/20 [30:37<2:15:20, 507.51s/it]


Early stopping occurred at epoch 60 with best_epoch = 20 and best_valid_mae = 10.43009


[I 2025-09-11 17:34:45,135] Trial 4 finished with value: 2038302480.8580682 and parameters: {'n_d': 228, 'n_a': 233, 'n_steps': 8, 'gamma': 65.91206170828102, 'lambda_sparse': 2.0913389525723742, 'batch_size': 1024, 'mask_type': 'sparsemax', 'emb': 66, 'momentum': 0.03995179252079938, 'learning_rate': 4.20500152511888, 'weight_decay': 0.0009783991297982494, 'scheduler_gamma': 0.8593686976204018, 'step_size': 9, 'virtual_batch_size': 1024, 'optimizer_type': 'rmsprop', 'p': 0.9676167346536377}. Best is trial 3 with value: 21.272145899146267.
 25%|██▌       | 5/20 [48:00<2:55:07, 700.47s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 65 and best_valid_mae = 0.00131


[I 2025-09-11 17:44:12,107] Trial 5 finished with value: 0.134551864874649 and parameters: {'n_d': 136, 'n_a': 82, 'n_steps': 2, 'gamma': 61.58542185849765, 'lambda_sparse': 0.17026512966010113, 'batch_size': 1024, 'mask_type': 'entmax', 'emb': 45, 'momentum': 0.7515157945275717, 'learning_rate': 0.08779748186052796, 'weight_decay': 1.706501547402328e-05, 'scheduler_gamma': 0.5575353100830288, 'step_size': 10, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.012559852015965677}. Best is trial 5 with value: 0.134551864874649.
 30%|███       | 6/20 [57:27<2:32:51, 655.08s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 64 and best_valid_mae = 0.00101


[I 2025-09-11 18:00:00,832] Trial 6 finished with value: 0.06870629379665665 and parameters: {'n_d': 75, 'n_a': 253, 'n_steps': 10, 'gamma': 77.13726204291119, 'lambda_sparse': 2.033600968988791e-11, 'batch_size': 2048, 'mask_type': 'sparsemax', 'emb': 41, 'momentum': 0.5158252222355557, 'learning_rate': 0.0021040765227438576, 'weight_decay': 8.376844290145493e-06, 'scheduler_gamma': 0.796171403854875, 'step_size': 3, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.14330200935617907}. Best is trial 6 with value: 0.06870629379665665.
 35%|███▌      | 7/20 [1:13:15<2:42:44, 751.08s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 65 and best_valid_mae = 0.00088


[I 2025-09-11 18:10:37,703] Trial 7 finished with value: 0.11391084363622009 and parameters: {'n_d': 187, 'n_a': 15, 'n_steps': 8, 'gamma': 8.045391733621763, 'lambda_sparse': 0.0014652361029021176, 'batch_size': 4096, 'mask_type': 'entmax', 'emb': 63, 'momentum': 0.7147496576978352, 'learning_rate': 0.001956198118192057, 'weight_decay': 0.00027227644241387247, 'scheduler_gamma': 0.6063949475537203, 'step_size': 8, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.5992749429156566}. Best is trial 6 with value: 0.06870629379665665.
 40%|████      | 8/20 [1:23:52<2:22:56, 714.72s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 35 and best_valid_mae = 0.51037


[I 2025-09-11 18:17:00,030] Trial 8 finished with value: 18479.558845161224 and parameters: {'n_d': 232, 'n_a': 89, 'n_steps': 1, 'gamma': 82.88664389580218, 'lambda_sparse': 51.53592279068219, 'batch_size': 2048, 'mask_type': 'entmax', 'emb': 56, 'momentum': 0.5684219031761095, 'learning_rate': 1.6779101090622274e-07, 'weight_decay': 6.546390997948744e-05, 'scheduler_gamma': 0.19614930939489625, 'step_size': 3, 'virtual_batch_size': 256, 'optimizer_type': 'adamw', 'p': 0.6885503429942427}. Best is trial 6 with value: 0.06870629379665665.
 45%|████▌     | 9/20 [1:30:15<1:51:58, 610.81s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 30 and best_valid_mae = 0.00935


[I 2025-09-11 18:39:39,872] Trial 9 finished with value: 0.9965631040242399 and parameters: {'n_d': 76, 'n_a': 235, 'n_steps': 10, 'gamma': 9.078302336833032, 'lambda_sparse': 4.926014143503172e-11, 'batch_size': 1024, 'mask_type': 'sparsemax', 'emb': 70, 'momentum': 0.7322070212236788, 'learning_rate': 0.5225686305599698, 'weight_decay': 1.2031455595255024e-06, 'scheduler_gamma': 0.2810409578500739, 'step_size': 10, 'virtual_batch_size': 256, 'optimizer_type': 'adam', 'p': 0.10011617751052164}. Best is trial 6 with value: 0.06870629379665665.
 50%|█████     | 10/20 [1:52:54<2:20:20, 842.05s/it]


Early stopping occurred at epoch 67 with best_epoch = 27 and best_valid_mae = 1.33207


[I 2025-09-11 18:49:36,370] Trial 10 finished with value: 65882.79495530597 and parameters: {'n_d': 136, 'n_a': 163, 'n_steps': 6, 'gamma': 99.06204466001327, 'lambda_sparse': 6.289284799679228e-07, 'batch_size': 2048, 'mask_type': 'entmax', 'emb': 16, 'momentum': 0.28899747874857185, 'learning_rate': 8.177295396632677e-05, 'weight_decay': 7.567873317090115e-05, 'scheduler_gamma': 0.9515071570336336, 'step_size': 20, 'virtual_batch_size': 512, 'optimizer_type': 'sgd', 'p': 0.2945776344171226}. Best is trial 6 with value: 0.06870629379665665.
 55%|█████▌    | 11/20 [2:02:51<1:55:02, 766.90s/it]


Early stopping occurred at epoch 65 with best_epoch = 25 and best_valid_mae = 0.0088


[I 2025-09-11 18:59:53,969] Trial 11 finished with value: 5.061862852667332 and parameters: {'n_d': 182, 'n_a': 32, 'n_steps': 7, 'gamma': 74.36859608964107, 'lambda_sparse': 1.2071815377648762e-05, 'batch_size': 2048, 'mask_type': 'entmax', 'emb': 19, 'momentum': 0.4060833233938917, 'learning_rate': 0.0029615351740656876, 'weight_decay': 0.0004175865674848439, 'scheduler_gamma': 0.716396624421976, 'step_size': 1, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.3290608715343052}. Best is trial 6 with value: 0.06870629379665665.
 60%|██████    | 12/20 [2:13:08<1:36:11, 721.48s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 66 and best_valid_mae = 0.00092


[I 2025-09-11 19:10:50,141] Trial 12 finished with value: 0.08381290315922103 and parameters: {'n_d': 181, 'n_a': 15, 'n_steps': 10, 'gamma': 36.95293584882349, 'lambda_sparse': 8.092930185600877e-06, 'batch_size': 4096, 'mask_type': 'entmax', 'emb': 3, 'momentum': 0.5061074210679336, 'learning_rate': 0.004340153535951104, 'weight_decay': 0.000174010636087244, 'scheduler_gamma': 0.7011569903519305, 'step_size': 6, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.32044062305527354}. Best is trial 6 with value: 0.06870629379665665.
 65%|██████▌   | 13/20 [2:24:05<1:21:51, 701.70s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 66 and best_valid_mae = 1.63339


[I 2025-09-11 19:22:15,045] Trial 13 finished with value: 78767.19615227438 and parameters: {'n_d': 88, 'n_a': 158, 'n_steps': 10, 'gamma': 40.496803032318276, 'lambda_sparse': 3.297016798517748e-08, 'batch_size': 4096, 'mask_type': 'entmax', 'emb': 3, 'momentum': 0.43868269695730666, 'learning_rate': 0.04258036069090253, 'weight_decay': 4.5878679451239165e-06, 'scheduler_gamma': 0.7847658533588665, 'step_size': 5, 'virtual_batch_size': 512, 'optimizer_type': 'sgd', 'p': 0.28381063447797383}. Best is trial 6 with value: 0.06870629379665665.
 70%|███████   | 14/20 [2:35:30<1:09:39, 696.62s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 64 and best_valid_mae = 0.65061


[I 2025-09-11 19:35:51,734] Trial 14 finished with value: 14247.788698630791 and parameters: {'n_d': 174, 'n_a': 82, 'n_steps': 10, 'gamma': 91.45134159493354, 'lambda_sparse': 1.9567252134421635e-12, 'batch_size': 2048, 'mask_type': 'sparsemax', 'emb': 24, 'momentum': 0.5232997335870401, 'learning_rate': 4.3726303006942516e-05, 'weight_decay': 0.0001095203367008332, 'scheduler_gamma': 0.9107921456633107, 'step_size': 5, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.18239478794419606}. Best is trial 6 with value: 0.06870629379665665.
 75%|███████▌  | 15/20 [2:49:06<1:01:04, 732.82s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 61 and best_valid_mae = 0.23712


[I 2025-09-11 19:45:03,470] Trial 15 finished with value: 2434.635130893283 and parameters: {'n_d': 51, 'n_a': 51, 'n_steps': 8, 'gamma': 44.57343610089854, 'lambda_sparse': 2.6535059120522886e-05, 'batch_size': 4096, 'mask_type': 'entmax', 'emb': 27, 'momentum': 0.3511888916220908, 'learning_rate': 0.0005577605403484627, 'weight_decay': 5.694659780887386e-06, 'scheduler_gamma': 0.6511974069034726, 'step_size': 5, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.42591820937898167}. Best is trial 6 with value: 0.06870629379665665.
 80%|████████  | 16/20 [2:58:18<45:13, 678.31s/it]  


Early stopping occurred at epoch 63 with best_epoch = 23 and best_valid_mae = 1.51242


[I 2025-09-11 19:53:56,548] Trial 16 finished with value: 76782.7224414257 and parameters: {'n_d': 120, 'n_a': 256, 'n_steps': 5, 'gamma': 29.08080092164517, 'lambda_sparse': 1.92641075413058e-09, 'batch_size': 2048, 'mask_type': 'sparsemax', 'emb': 3, 'momentum': 0.5339670650879443, 'learning_rate': 2.363630398285578e-06, 'weight_decay': 0.00019596161627699672, 'scheduler_gamma': 0.39844984304796044, 'step_size': 1, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.21074436255913287}. Best is trial 6 with value: 0.06870629379665665.
 85%|████████▌ | 17/20 [3:07:11<31:43, 634.64s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 68 and best_valid_mae = 0.00105


[I 2025-09-11 20:08:01,920] Trial 17 finished with value: 0.08798476236308415 and parameters: {'n_d': 255, 'n_a': 145, 'n_steps': 9, 'gamma': 74.16295962322074, 'lambda_sparse': 3.3461740707298145e-07, 'batch_size': 4096, 'mask_type': 'entmax', 'emb': 30, 'momentum': 0.2784623773288536, 'learning_rate': 0.037065706773623724, 'weight_decay': 3.874126372062282e-05, 'scheduler_gamma': 0.8211040213378249, 'step_size': 6, 'virtual_batch_size': 512, 'optimizer_type': 'adamw', 'p': 0.4114862236901526}. Best is trial 6 with value: 0.06870629379665665.
 90%|█████████ | 18/20 [3:21:16<23:15, 697.96s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 68 and best_valid_mae = 1.30382


[I 2025-09-11 20:16:35,156] Trial 18 finished with value: 68252.9669945618 and parameters: {'n_d': 115, 'n_a': 190, 'n_steps': 4, 'gamma': 50.15152115579969, 'lambda_sparse': 1.0986303076644433e-10, 'batch_size': 2048, 'mask_type': 'entmax', 'emb': 11, 'momentum': 0.5829240123781667, 'learning_rate': 0.00029427083977075723, 'weight_decay': 0.0008497467196354667, 'scheduler_gamma': 0.6732031678599523, 'step_size': 13, 'virtual_batch_size': 512, 'optimizer_type': 'sgd', 'p': 0.18071255537172864}. Best is trial 6 with value: 0.06870629379665665.
 95%|█████████▌| 19/20 [3:29:50<10:42, 642.48s/it]

Stop training because you reached max_epochs = 70 with best_epoch = 69 and best_valid_mae = 0.23935


[I 2025-09-11 20:34:16,992] Trial 19 finished with value: 3642.428342554558 and parameters: {'n_d': 158, 'n_a': 112, 'n_steps': 7, 'gamma': 68.67530934132684, 'lambda_sparse': 3.4898664758216643e-09, 'batch_size': 1024, 'mask_type': 'sparsemax', 'emb': 36, 'momentum': 0.8941608323606214, 'learning_rate': 7.280527799058792e-06, 'weight_decay': 9.329815847905553e-06, 'scheduler_gamma': 0.9661013857475702, 'step_size': 17, 'virtual_batch_size': 256, 'optimizer_type': 'rmsprop', 'p': 0.37840918129724105}. Best is trial 6 with value: 0.06870629379665665.
100%|██████████| 20/20 [3:47:31<00:00, 682.60s/it]

Best hyperparameters for regression:  {'n_d': 75, 'n_a': 253, 'n_steps': 10, 'gamma': 77.13726204291119, 'lambda_sparse': 2.033600968988791e-11, 'batch_size': 2048, 'mask_type': 'sparsemax', 'emb': 41, 'momentum': 0.5158252222355557, 'learning_rate': 0.0021040765227438576, 'weight_decay': 8.376844290145493e-06, 'scheduler_gamma': 0.796171403854875, 'step_size': 3, 'virtual_batch_size': 512, 'optimizer_type': 'rmsprop', 'p': 0.14330200935617907}
Best mae:  0.06870629379665665





In [None]:
best_params = par

n_d = best_params['n_d']; n_a = best_params['n_a']; n_steps = best_params['n_steps']
gamma = best_params['gamma']; lambda_sparse = best_params['lambda_sparse']
mask_type = best_params['mask_type']; batch_size = best_params['batch_size']
emb = best_params['emb']; p = best_params['p']
momentum = best_params['momentum']; learning_rate = best_params['learning_rate']
weight_decay = best_params['weight_decay']
scheduler_gamma = best_params['scheduler_gamma']; step_size = best_params['step_size']
virtual_batch_size = best_params['virtual_batch_size']; optimizer_type = best_params['optimizer_type']

# Optimizer config (misma lógica)
optimizer_fn, optimizer_params = optimizer_fn, optimizer_params = build_optimizer(optimizer_type, learning_rate, momentum, weight_decay)


# Aumento y categóricas
aug = RegressionSMOTE(p=p)
cat_idxs = [i for i, f in enumerate(features) if f in CATEGORICAL_COLUMNS]
cat_dims = [categorical_dims[f] for f in features if f in CATEGORICAL_COLUMNS]
cat_emb_dim = [min(emb, (dim + 1)//2) for dim in cat_dims]

clf = CustomTabNetRegressor(
    cat_dims=cat_dims, cat_emb_dim=cat_emb_dim, cat_idxs=cat_idxs,
    n_d=n_d, n_a=n_a, n_steps=n_steps, gamma=gamma, lambda_sparse=lambda_sparse,
    mask_type=mask_type, optimizer_fn=optimizer_fn, optimizer_params=optimizer_params,
    scheduler_params={"gamma": scheduler_gamma, "step_size": step_size},
    scheduler_fn=torch.optim.lr_scheduler.StepLR,
    momentum=momentum, verbose=True
)

clf.fit(
    X_train=X_train, y_train=y_train[:,0:1],
    eval_set=[(X_train, y_train[:,0:1]), (X_valid, y_valid[:,0:1])],
    eval_name=['train', 'valid'], eval_metric=['mae'],
    loss_fn=my_r2_score_fn,
    max_epochs=200, patience=70,
    batch_size=batch_size, virtual_batch_size=virtual_batch_size,
    num_workers=1, drop_last=False, augmentations=aug,
)






epoch 0  | loss: 44989.84148| train_mae: 3.53835 | valid_mae: 3.52597 |  0:00:12s
epoch 1  | loss: 1896.84927| train_mae: 0.59406 | valid_mae: 0.59604 |  0:00:25s
epoch 2  | loss: 358.01071| train_mae: 0.25387 | valid_mae: 0.25235 |  0:00:37s
epoch 3  | loss: 122.70116| train_mae: 0.06537 | valid_mae: 0.06684 |  0:00:50s
epoch 4  | loss: 70.43245| train_mae: 0.08271 | valid_mae: 0.0832  |  0:01:02s
epoch 5  | loss: 28.69788| train_mae: 0.05096 | valid_mae: 0.05115 |  0:01:15s
epoch 6  | loss: 19.93663| train_mae: 0.02972 | valid_mae: 0.03072 |  0:01:28s
epoch 7  | loss: 8.29522 | train_mae: 0.03932 | valid_mae: 0.03929 |  0:01:41s
epoch 8  | loss: 5.63204 | train_mae: 0.00843 | valid_mae: 0.0087  |  0:01:54s
epoch 9  | loss: 1.67899 | train_mae: 0.00554 | valid_mae: 0.00571 |  0:02:06s
epoch 10 | loss: 2.22248 | train_mae: 0.01596 | valid_mae: 0.0163  |  0:02:19s
epoch 11 | loss: 2.1938  | train_mae: 0.00835 | valid_mae: 0.00841 |  0:02:32s
epoch 12 | loss: 1.35881 | train_mae: 0.00871



In [None]:
y_pred=clf.predict(X_test)
y_pred_ = scaler.inverse_transform(y_pred)
y_test_ = scaler.inverse_transform(y_test)
# Plot 2
r2_1 = r2_score(y_test, y_pred)
print(f" Test  -> R2={r2_1:.4f}")

 Test  -> R2=0.8577


In [None]:
torch.save(clf, "/content/drive/MyDrive/Colab Notebooks/Diego/model.pth")