# 04 | Scorecard & Scaling desde el modelo de Red Neuronal

Este notebook parte del **Challenger NN** entrenado en `train.py` y genera un 
scorecard calibrado y escalado para uso operativo en gestión de riesgo crediticio.

**Flujo general**
1. Importar dependencias y fijar rutas.
2. Cargar arquitectura + pesos de la red neuronal.
3. Preparar `CreditDataset` para inferencia sobre todo el dataset.
4. Generar probabilidades crudas `y_prob_raw`.
5. Calibrar las probabilidades mediante *Platt Scaling* (sigmoide).
6. Escalar probabilidades calibradas a un rango **300 – 850**.
7. Evaluar métricas clave y guardar artefactos.


## 1) Importar librerías y parámetros globales

In [None]:
import joblib, torch
import numpy as np, pandas as pd
from pathlib import Path

from sklearn.calibration import CalibratedClassifierCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import roc_auc_score, average_precision_score, brier_score_loss

# Carpeta donde `train.py` guardó los artefactos
REPORT_DIR = Path("C:\GitHub\Trading\credit-risk-ml-nn\reports")


## 2) Cargar arquitectura y pesos del modelo NN

In [3]:
# Cargamos estudio Optuna para conocer los mejores hiperparámetros
import joblib, importlib
study = joblib.load(REPORT_DIR / "optuna_study.pkl")
best  = study.best_trial.params

# Importamos definiciones del script de entrenamiento
train = importlib.import_module("train")
RiskNN          = train.RiskNN
CreditDataset   = train.CreditDataset
evaluate_probs  = train.evaluate_probs
numerical_cols  = train.numerical_cols
categorical_cols= train.categorical_cols
cat_dims        = train.cat_dims
emb_dims        = train.emb_dims
train_ds        = train.train_ds
DEVICE          = train.DEVICE

# Reconstrucción del modelo
nn_model = RiskNN(
    num_features=len(numerical_cols),
    cat_dims=cat_dims,
    emb_dims=emb_dims,
    hidden=best['hidden'],
    dropout=best['dropout']
).to(DEVICE)

nn_model.load_state_dict(torch.load(REPORT_DIR / "best_model.pth", map_location=DEVICE))
nn_model.eval()


FileNotFoundError: [Errno 2] No such file or directory: 'reports\\optuna_study.pkl'

## 3) Probabilidades crudas en el dataset completo

In [None]:
# Dataset completo
df = pd.read_csv("data/processed/data_loan_complete.csv")
df = df[df["loan_status_bin"].notna()].copy()
df["target"] = df["loan_status_bin"].astype(int)

# Dataset para inferencia
full_ds = CreditDataset(
    df, numerical_cols, categorical_cols,
    num_stats=(train_ds.means, train_ds.stds),
    cat_maps=train_ds.cat_maps
)
loader = torch.utils.data.DataLoader(full_ds, batch_size=2048, shuffle=False, num_workers=0)

y_true, y_prob_raw = evaluate_probs(nn_model, loader)


## 4) Calibración con Platt Scaling

In [None]:
class WrappedNN:
    """Wrapper para exponer predict_proba a sklearn"""
    def __init__(self, model):
        self.model = model
        self.num_len = len(numerical_cols)

    def predict_proba(self, X):
        X_num = torch.tensor(X[:, :self.num_len], dtype=torch.float32).to(DEVICE)
        X_cat = torch.tensor(X[:, self.num_len:], dtype=torch.int64).to(DEVICE)
        with torch.no_grad():
            logits = self.model(X_num, X_cat)
            p1 = torch.sigmoid(logits).cpu().numpy()
        return np.vstack([1 - p1, p1]).T

X_concat = np.hstack([full_ds.X_num.numpy(), full_ds.X_cat.numpy()])

calibrator = CalibratedClassifierCV(base_estimator=WrappedNN(nn_model),
                                    method='sigmoid', cv='prefit')
calibrator.fit(X_concat, y_true)
y_prob_cal = calibrator.predict_proba(X_concat)[:, 1]


## 5) Escalado a rango 300–850

In [None]:
scaler_score = MinMaxScaler(feature_range=(850, 300))  # high prob -> low score
scaler_score.fit(y_prob_cal.reshape(-1, 1))
scores = scaler_score.transform(y_prob_cal.reshape(-1, 1)).ravel()

# Guardar objetos
joblib.dump(calibrator, REPORT_DIR / "nn_calibrator.pkl")
joblib.dump(scaler_score, REPORT_DIR / "score_scaler_nn.pkl")


## 6) Métricas de desempeño

In [None]:
print('--- Desempeño de la red neuronal ---')
print('ROC-AUC crudo :', roc_auc_score(y_true, y_prob_raw))
print('ROC-AUC cal   :', roc_auc_score(y_true, y_prob_cal))
print('PR-AUC  cal   :', average_precision_score(y_true, y_prob_cal))
print('Brier   cal   :', brier_score_loss(y_true, y_prob_cal))


## 7) Exportar scorecard a CSV

In [None]:
        pd.DataFrame({
"
        "target": y_true,
"
        "prob_raw": y_prob_raw,
"
        "prob_cal": y_prob_cal,
"
        "score": scores
"
        }).to_csv(REPORT_DIR / "nn_scores.csv", index=False)
"
        print('Archivo nn_scores.csv guardado en reports/')
"
