# 04 | Scorecard & Scaling desde el modelo de Red Neuronal

Este notebook parte del **Challenger NN** entrenado en `train.py` y genera un 
scorecard calibrado y escalado para uso operativo en gestión de riesgo crediticio.

**Flujo general**
1. Importar dependencias y fijar rutas.
2. Cargar arquitectura + pesos de la red neuronal.
3. Preparar `CreditDataset` para inferencia sobre todo el dataset.
4. Generar probabilidades crudas `y_prob_raw`.
5. Calibrar las probabilidades mediante *Platt Scaling* (sigmoide).
6. Escalar probabilidades calibradas a un rango **300 – 850**.
7. Evaluar métricas clave y guardar artefactos.


## 1) Importar librerías y parámetros globales

In [1]:
import joblib, torch
import numpy as np, pandas as pd
from pathlib import Path

from sklearn.calibration import CalibratedClassifierCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import roc_auc_score, average_precision_score, brier_score_loss

# Carpeta donde `train.py` guardó los artefactos
REPORT_DIR = Path("C:/GitHub/trading/credit-risk-ml-nn/reports")


## 2) Cargar arquitectura y pesos del modelo NN

In [2]:
# %% 0 | Ajustar sys.path para poder importar train.py
import sys
from pathlib import Path

# 1) Pon aquí la carpeta donde está train.py:
REPO_ROOT = Path("C:/Github/trading/credit-risk-ml-nn").resolve()

# 2) Inserta REPO_ROOT en sys.path si no está ya
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

# 3) Comprobación
print("Módulos buscados en:", sys.path[0])


Módulos buscados en: C:\GitHub\Trading\credit-risk-ml-nn


In [3]:
# %% 1 | Ajustar sys.path para importar train.py
import sys
from pathlib import Path

# Comprueba
print("Python buscará módulos en:", sys.path[0])

# %% 2 | Importar definiciones del script de entrenamiento
import train
RiskNN        = train.RiskNN
CreditDataset = train.CreditDataset
train_one_epoch   = train.train_one_epoch
evaluate_probs    = train.evaluate_probs
evaluate_metrics  = train.evaluate_metrics
# …importa todas las funciones/clases que necesites

# %% 3 | Ahora puedes usar tu modelo y loaders
# Por ejemplo:
# model = RiskNN(len(numerical_cols), cat_dims, emb_dims, hidden=(256,128), dropout=0.3).to(device)
# model.load_state_dict(torch.load(best_model_path, map_location=device))
# model.eval()


Python buscará módulos en: C:\GitHub\Trading\credit-risk-ml-nn


  from .autonotebook import tqdm as notebook_tqdm


## 3) Probabilidades crudas en el dataset completo

In [4]:
# %% 0 | Rutas y device -------------------------------------------------
from pathlib import Path
import sys, joblib, torch, pandas as pd, numpy as np, json

REPO_ROOT = Path(r"C:/Github/trading/credit-risk-ml-nn").resolve()
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
BATCH  = 2048   # mismo batch del entrenamiento

# %% 1 | Importa utilidades de train.py ---------------------------------
import train                          # train.py debe estar en REPO_ROOT
RiskNN            = train.RiskNN
CreditDataset     = train.CreditDataset
evaluate_probs    = train.evaluate_probs
evaluate_metrics  = train.evaluate_metrics

# %% 2 | Carga CSV y recrea splits idénticos ----------------------------
DATA_PATH = REPO_ROOT / "data" / "processed" / "data_loan_complete.csv"
df = pd.read_csv(DATA_PATH)
df = df[df["loan_status_bin"].notna()].copy()
df["target"] = df["loan_status_bin"].astype(int)
df.drop(columns=["loan_status_bin"], inplace=True)

from sklearn.model_selection import train_test_split
SEED = 42
train_df, test_df  = train_test_split(
    df, test_size=0.15, stratify=df["target"], random_state=SEED
)
train_df, valid_df = train_test_split(
    train_df, test_size=0.1765, stratify=train_df["target"], random_state=SEED
)

categorical_cols = [c for c in df.columns if df[c].dtype == "object"]
numerical_cols   = [c for c in df.columns if c not in categorical_cols + ["target"]]

# %% 3 | Estadísticos y mapeos del train --------------------------------
train_ds = CreditDataset(train_df, numerical_cols, categorical_cols)

# %% 4 | Localiza y carga el estudio de Optuna --------------------------
study_candidates = [
    REPO_ROOT / "reports" / "modeloFinal" / "optuna_study.pkl",
    REPO_ROOT / "reports" / "optuna_study.pkl",
]
for p in study_candidates:
    if p.exists():
        study_path = p
        break
else:
    raise FileNotFoundError("❌ No encontré optuna_study.pkl en reports/ ni reports/modeloFinal/")

best = joblib.load(study_path).best_trial.params
hidden  = tuple(best["hidden"])      # e.g. (256,128,64)
dropout = best["dropout"]

# Embeddings en el mismo orden del entrenamiento
cat_dims = [train_df[c].nunique() for c in categorical_cols]
emb_dims = [min(50, d//2 + 1) for d in cat_dims]

# %% 5 | Localiza pesos del modelo y lo reconstruye ---------------------
weight_candidates = [
    REPO_ROOT / "reports" / "modeloFinal" / "best_model_final.pth",
    REPO_ROOT / "reports" / "modeloFinal" / "best_model.pth",
    REPO_ROOT / "reports" / "best_model.pth",
]
for p in weight_candidates:
    if p.exists():
        model_path = p
        break
else:
    raise FileNotFoundError("❌ No encontré best_model*.pth en reports/ ni reports/modeloFinal/")

nn_model = RiskNN(
    num_features=len(numerical_cols),
    cat_dims    =cat_dims,
    emb_dims    =emb_dims,
    hidden      =hidden,
    dropout     =dropout
).to(DEVICE)

nn_model.load_state_dict(torch.load(model_path, map_location=DEVICE))
nn_model.eval()
print(f"✔️  Pesos cargados desde {model_path.relative_to(REPO_ROOT)}")

# %% 6 | Inferencia y métricas sobre TODO el dataframe ------------------
full_ds = CreditDataset(
    df, numerical_cols, categorical_cols,
    num_stats=(train_ds.means, train_ds.stds),
    cat_maps=train_ds.cat_maps
)
loader = torch.utils.data.DataLoader(full_ds, batch_size=BATCH, shuffle=False)

y_true, y_prob = evaluate_probs(nn_model, loader)
print(json.dumps(evaluate_metrics(y_true, y_prob), indent=2))


✔️  Pesos cargados desde reports\modeloFinal\best_model_final.pth
{
  "roc": 0.672795445623447,
  "pr": 0.35794946071878425,
  "brier": 0.22303508210400796,
  "accuracy": 0.6295050832309239,
  "precision": 0.31979866366552645,
  "recall": 0.6135432428296979,
  "f1": 0.4204468004543733,
  "f2": 0.5183240358265205
}


## 4) Calibración con Platt Scaling

In [None]:
# Pesos de la red cargados
...
Métricas calibradas:
{
  "roc": 0.6731,
  "pr": 0.3580,
  ...
}


SyntaxError: invalid syntax (1285965194.py, line 3)

## 5) Escalado a rango 300–850

In [None]:
scaler_score = MinMaxScaler(feature_range=(850, 300))  # high prob -> low score
scaler_score.fit(y_prob_cal.reshape(-1, 1))
scores = scaler_score.transform(y_prob_cal.reshape(-1, 1)).ravel()

# Guardar objetos
joblib.dump(calibrator, REPORT_DIR / "nn_calibrator.pkl")
joblib.dump(scaler_score, REPORT_DIR / "score_scaler_nn.pkl")


## 6) Métricas de desempeño

In [None]:
print('--- Desempeño de la red neuronal ---')
print('ROC-AUC crudo :', roc_auc_score(y_true, y_prob_raw))
print('ROC-AUC cal   :', roc_auc_score(y_true, y_prob_cal))
print('PR-AUC  cal   :', average_precision_score(y_true, y_prob_cal))
print('Brier   cal   :', brier_score_loss(y_true, y_prob_cal))


## 7) Exportar scorecard a CSV

In [None]:
        pd.DataFrame({
"
        "target": y_true,
"
        "prob_raw": y_prob_raw,
"
        "prob_cal": y_prob_cal,
"
        "score": scores
"
        }).to_csv(REPORT_DIR / "nn_scores.csv", index=False)
"
        print('Archivo nn_scores.csv guardado en reports/')
"
