En este notebook se muestra cómo sacar un modelo automático con la clase `scorecard` en un dataset de juguete

<span style='color:blue'>Importamos los módulos

In [1]:
import sys, pandas as pd, memento as me

<span style='color:blue'>Cargamos los datos separando las variables predictoras (guardadas en X) de la variable objetivo (guardada en y)

In [2]:
from sklearn.datasets import load_breast_cancer as lbc
X, y = pd.DataFrame(lbc().data, columns=lbc().feature_names), lbc().target

<span style='color:blue'>Sustituimos los espacios en blanco por guiones bajos en el nombre de las columnas

In [3]:
X.columns = [i.replace(' ', '_') for i in X.columns]

<span style='color:blue'>Todas las variables predictoras son numéricas, pero no pasaría nada si hubiera de texto, booleanas, con missings... Funcionaría igual

In [4]:
X.dtypes.unique()

array([dtype('float64')], dtype=object)

<span style='color:blue'>Aplicamos la clase scorecard para sacar el modelo

In [5]:
modelo = me.scorecard().fit(X, y)

Particionado 70-30 estratificado en el target terminado
------------------------------------------------------------------------------------------------------------------------
Autogrouping terminado. Máximo número de buckets = 5. Mínimo porcentaje por bucket = 0.05
------------------------------------------------------------------------------------------------------------------------
Step 01 | 0:00:00.538507 | pv = 4.92e-32 | Gini train = 83.97% | Gini test = 87.30% ---> Feature selected: mean_concavity
Step 02 | 0:00:00.836358 | pv = 1.38e-14 | Gini train = 96.82% | Gini test = 97.24% ---> Feature selected: worst_perimeter
Step 03 | 0:00:01.059373 | pv = 4.31e-06 | Gini train = 98.34% | Gini test = 98.07% ---> Feature selected: worst_texture
Step 04 | 0:00:01.146291 | pv = 5.11e-04 | Gini train = 98.92% | Gini test = 97.05% ---> Feature selected: worst_smoothness
Step 05 | 0:00:01.276256 | pv = 1.62e-03 | Gini train = 99.34% | Gini test = 98.51% ---> Feature selected: radius_error
St

<span style='color:blue'>Visualizamos la scorecard (si usas notebooks o ventana interactiva en VSC con el tema blanco se verá bien, si no usa display(modelo.scorecard))

In [6]:
me.pretty_scorecard(modelo)

Unnamed: 0,Variable,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV,Raw score,Aligned score
0,worst_perimeter,"(-inf, 91.69)",167,0.419598,2,165,0.988024,-3.88855,2.513895,5.210448,-48
1,worst_perimeter,"[91.69, 102.05)",58,0.145729,5,53,0.913793,-1.836605,0.327313,2.460953,32
2,worst_perimeter,"[102.05, 114.65)",52,0.130653,23,29,0.557692,0.292447,0.011524,-0.391863,114
3,worst_perimeter,"[114.65, inf)",121,0.30402,118,3,0.024793,4.196321,3.29536,-5.622845,265
4,worst_texture,"(-inf, 23.35)",159,0.399497,19,140,0.880503,-1.472955,0.635759,2.334955,35
5,worst_texture,"[23.35, 28.24)",112,0.281407,54,58,0.517857,0.45279,0.06016,-0.717771,123
6,worst_texture,"[28.24, 29.23)",20,0.050251,5,15,0.75,-0.574364,0.015058,0.910492,76
7,worst_texture,"[29.23, 31.17)",31,0.077889,26,5,0.16129,2.172907,0.338269,-3.444533,202
8,worst_texture,"[31.17, inf)",76,0.190955,44,32,0.421053,0.842702,0.142667,-1.335867,141
9,worst_smoothness,"(-inf, 0.10)",34,0.085427,1,33,0.970588,-2.972259,0.372255,6.443796,-83


<span style='color:blue'>Podemos guardar la scorecard en un excel con el que enseñar los resultados y poder interpretar bien el sentido y la aportación de cada variable

In [7]:
modelo.create_excel('scorecard_ejemplo_01.xlsx')