En este notebook utilizamos un nuevo dataset donde vemos como hacer `agrupaciones manuales` y mostramos como se maneja Memento con los `missings`

<span style='color:blue'>Importamos los módulos

In [1]:
import numpy as np, pandas as pd, pyken as pyk

<span style='color:blue'>Cargamos el dataset

In [2]:
data = pd.read_csv('stroke_data.csv')
X, y = data.drop('stroke', axis=1), data['stroke']
print('El dataset tiene {} filas y {} columnas'.format(X.shape[0], X.shape[1]))

El dataset tiene 5110 filas y 11 columnas


<span style='color:blue'>No todas las variables son numéricas

In [3]:
X.dtypes.unique()

array([dtype('int64'), dtype('O'), dtype('float64')], dtype=object)

<span style='color:blue'>La variable `bmi` es la única que tiene missings

In [4]:
X.isna().sum()

id                     0
gender                 0
age                    0
hypertension           0
heart_disease          0
ever_married           0
work_type              0
Residence_type         0
avg_glucose_level      0
bmi                  201
smoking_status         0
dtype: int64

<span style='color:blue'>Vamos a meter missings también en la variable categórica `work_type` quitando la categoria 'Self-employed'

In [5]:
X['work_type'] = X['work_type'].replace('Self-employed', np.nan)
X.isna().sum()

id                     0
gender                 0
age                    0
hypertension           0
heart_disease          0
ever_married           0
work_type            819
Residence_type         0
avg_glucose_level      0
bmi                  201
smoking_status         0
dtype: int64

<span style='color:blue'>Probamos a sacar la scorecard automática. Excluimos la variable `id` por motivos evidentes

In [6]:
modelo1 = pyk.autoscorecard(excluded_vars=['id']).fit(X, y)

Particionado 70-30 estratificado en el target terminado.
------------------------------------------------------------------------------------------------------------------------------------------------------
Autogrouping terminado. Máximo número de buckets = 5. Mínimo porcentaje por bucket = 0.05
------------------------------------------------------------------------------------------------------------------------------------------------------
Cuidado, has puesto un valor numero máximo de iteraciones (12) superior al número de variables candidatas (9)
------------------------------------------------------------------------------------------------------------------------------------------------------
Step 01 | Time - 0:00:00.588744 | p-value = 1.89e-36 | Gini train = 64.73% | Gini test = 66.78% ---> Feature selected: age
Step 02 | Time - 0:00:00.594736 | p-value = 5.95e-11 | Gini train = 70.88% | Gini test = 67.94% ---> Feature selected: bmi
Step 03 | Time - 0:00:00.564792 | p-value = 

<span style='color:blue'>Pintamos la scorecard con colorines

In [7]:
pyk.pretty_scorecard(modelo1)

Unnamed: 0,Variable,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV,Raw score,Aligned score
0,age,"(-inf, 48.50)",1996,0.55801,1984,12,0.006012,2.134606,1.097293,-1.958686,252
1,age,"[48.50, 56.50)",454,0.126922,434,20,0.044053,0.103955,0.001309,-0.095388,198
2,age,"[56.50, 67.50)",535,0.149567,496,39,0.072897,-0.430343,0.033732,0.394877,184
3,age,"[67.50, 75.50)",278,0.077719,238,40,0.143885,-1.189966,0.190331,1.091897,164
4,age,"[75.50, inf)",314,0.087783,251,63,0.200637,-1.591039,0.458713,1.459916,153
5,bmi,Missing,147,0.041096,118,29,0.197279,-1.569969,0.207222,1.146982,162
6,bmi,"(-inf, 23.75)",910,0.254403,893,17,0.018681,0.988016,0.16274,-0.721821,216
7,bmi,"[23.75, 30.75)",1306,0.36511,1239,67,0.051302,-0.05599,0.001174,0.040905,194
8,bmi,"[30.75, 32.05)",199,0.055633,181,18,0.090452,-0.665232,0.033435,0.486003,182
9,bmi,"[32.05, 36.45)",501,0.140062,489,12,0.023952,0.734098,0.05486,-0.536315,211


<span style='color:red'>¿Y si no quiero usar las agrupaciones del autogrouping?... ¿Y si quiero modificarlas o directamente usar las que a mí me de la gana?

<span style='color:blue'>Lo recomendado para reagruparar variables manualmente es poner en `candidate_var`el nombre de la variable que se quiere reagrupar y en `modelo_newgroups` el objeto modelo

In [8]:
candidate_var, modelo_newgroups = 'age', modelo1
objeto = modelo_newgroups.objetos[candidate_var]

<span style='color:blue'>Así vemos primero, a modo de referencia, que tabla teníamos con la agrupación automática del autogrouping

In [9]:
if objeto.dtype != 'O': L = [round(i, 4) for  i in list(objeto.breakpoints)]
else: L = objeto.breakpoints
print('Resultado del autogrouping de {}: {} \n'.format(candidate_var, L))
display(objeto.table)

Resultado del autogrouping de age: [48.5, 56.5, 67.5, 75.5] 



Unnamed: 0,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV
0,"(-inf, 48.50)",1996,0.55801,1984,12,0.006012,2.134606,1.097293
1,"[48.50, 56.50)",454,0.126922,434,20,0.044053,0.103955,0.001309
2,"[56.50, 67.50)",535,0.149567,496,39,0.072897,-0.430343,0.033732
3,"[67.50, 75.50)",278,0.077719,238,40,0.143885,-1.189966,0.190331
4,"[75.50, inf)",314,0.087783,251,63,0.200637,-1.591039,0.458713
Totals,,3577,1.0,3403,174,0.048644,,1.781379


<span style='color:blue'>En la siguiente celda podemos toquetear y modifcar los puntos de ruptura entre los grupos, viendo interactivamente como se actualiza la tabla

In [10]:
bp = [30, 60]

vector = pyk.data_convert(pyk.string_categories2(bp)).fit(modelo_newgroups.X_train[candidate_var]).x_final
breakpoints_num = pyk.breakpoints_to_num(bp)
groups_names = pyk.compute_group_names(objeto.dtype, bp)
                                      
pyk.compute_table(vector, modelo_newgroups.y_train, breakpoints_num, groups_names)

Unnamed: 0,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV
0,"(-inf, 30.00)",1077,0.30109,1075,2,0.001857,3.313571,1.008663
1,"[30.00, 60.00)",1538,0.429969,1494,44,0.028609,0.551665,0.102693
2,"[60.00, inf)",962,0.26894,834,128,0.133056,-1.099154,0.539195
Totals,,3577,1.0,3403,174,0.048644,,1.65055


<span style='color:blue'>Para usar esta nueva agrupación de la variable `age` lanzamos de nuevo una scorecard con la agrupación en `user_breakpoints`

In [11]:
modelo2 = pyk.autoscorecard(excluded_vars=['id'], user_breakpoints={'age': [30, 60]}).fit(X, y)

Particionado 70-30 estratificado en el target terminado.
------------------------------------------------------------------------------------------------------------------------------------------------------
Autogrouping terminado. Máximo número de buckets = 5. Mínimo porcentaje por bucket = 0.05
------------------------------------------------------------------------------------------------------------------------------------------------------
Cuidado, has puesto un valor numero máximo de iteraciones (12) superior al número de variables candidatas (9)
------------------------------------------------------------------------------------------------------------------------------------------------------
Step 01 | Time - 0:00:00.502078 | p-value = 8.28e-27 | Gini train = 56.54% | Gini test = 54.72% ---> Feature selected: age
Step 02 | Time - 0:00:00.510524 | p-value = 1.99e-10 | Gini train = 64.46% | Gini test = 58.66% ---> Feature selected: bmi
Step 03 | Time - 0:00:00.607537 | p-value = 

<span style='color:red'>Perooooo... ¿Por qué ahora entra `hypertension`!?!?

<span style='color:blue'>Si te fijas, con la nueva agrupación, `age` es aparentemente menos discriminante: ahora en el primer paso el modelo tiene un 56.54% de gini en train cuando antes, con la agrupación automática, en el primer paso el modelo tenía un 64.73%. Por este motivo ahora al final el método de selección de variables acaba escogiendo también a `hypertension`

<span style='color:blue'>Si se quisiera evitar esto, se puede introducir las variables exactas que queremos formen parte de la scorecard en el parámetro `features`

In [12]:
modelo3 = pyk.autoscorecard(features=['age', 'bmi', 'avg_glucose_level'], user_breakpoints={'age': [30, 60]}).fit(X, y)

Particionado 70-30 estratificado en el target terminado.
------------------------------------------------------------------------------------------------------------------------------------------------------
Autogrouping terminado. Máximo número de buckets = 5. Mínimo porcentaje por bucket = 0.05
------------------------------------------------------------------------------------------------------------------------------------------------------
Step 01 | Time - 0:00:00.000000 | p-value = 8.28e-27 | Gini train = 56.54% | Gini test = 54.72% ---> Feature selected: age
Step 02 | Time - 0:00:00.000000 | p-value = 1.99e-10 | Gini train = 64.46% | Gini test = 58.66% ---> Feature selected: bmi
Step 03 | Time - 0:00:00.000000 | p-value = 1.01e-06 | Gini train = 67.69% | Gini test = 60.01% ---> Feature selected: avg_glucose_level
------------------------------------------------------------------------------------------------------------------------------------------------------
Selección termina

<span style='color:blue'>Vemos como quedaría la scorecard

In [13]:
pyk.pretty_scorecard(modelo3, color1='green')

Unnamed: 0,Variable,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV,Raw score,Aligned score
0,age,"(-inf, 30.00)",1077,0.30109,1075,2,0.001857,3.313571,1.008663,-2.949666,281
1,age,"[30.00, 60.00)",1538,0.429969,1494,44,0.028609,0.551665,0.102693,-0.49108,210
2,age,"[60.00, inf)",962,0.26894,834,128,0.133056,-1.099154,0.539195,0.978442,167
3,bmi,Missing,147,0.041096,118,29,0.197279,-1.569969,0.207222,1.11195,163
4,bmi,"(-inf, 23.75)",910,0.254403,893,17,0.018681,0.988016,0.16274,-0.699775,216
5,bmi,"[23.75, 30.75)",1306,0.36511,1239,67,0.051302,-0.05599,0.001174,0.039656,194
6,bmi,"[30.75, 32.05)",199,0.055633,181,18,0.090452,-0.665232,0.033435,0.471159,182
7,bmi,"[32.05, 36.45)",501,0.140062,489,12,0.023952,0.734098,0.05486,-0.519935,211
8,bmi,"[36.45, inf)",514,0.143696,483,31,0.060311,-0.227328,0.008235,0.161008,191
9,avg_glucose_level,"(-inf, 72.72)",645,0.180319,624,21,0.032558,0.418271,0.026216,-0.225926,202


<span style='color:blue'>Vemos que también entró una de las dos variables con missings: el `bmi`. De hecho, ha puesto en un grupo a parte a estos missings y esto no es casualidad:
    
- En el autogrouping de Memento, a una variable numérica con missings **siempre se le dará un grupo aparte para estos missings**. Siempre, independientemente de su volumen (si te fijas en este caso ese grupo ni si quiera llega al 5% mínimo que se suele exigir, da igual). Esto se hace con una asignación inicial de los missings al valor -12345678, que entendemos siempre va a ser el mínimo de esa variable de forma que con un corte inmediatamente posterior a ese número nos garantizamos que estos missing están en un grupo aparte.

<span style='color:red'>Ok, entendido... Pero... Y si quiero juntar esos missings con otro grupo... ¿Cómo lo hago?
- Es muy sencillo lo mostramos en la siguientes celdas. Sin embargo, no ha sido nada sencilla esta parte del código (una de la + challenging con diferencia), digamos que se basa en una idea original (o chapucera, según se mire) en la que se va remapeando el valor del missing al extremo inferior del intervalo al que se desea mover sumándole a ese extremo los primeros decimales del número $e$... Si alguno tiene curiosidad se lo explico en detalle

In [14]:
candidate_var, modelo_newgroups = 'bmi', modelo3
objeto = modelo_newgroups.objetos[candidate_var]

In [15]:
if objeto.dtype != 'O': L = [round(i, 4) for  i in list(objeto.breakpoints)]
else: L = objeto.breakpoints
print('Resultado del autogrouping de {}: {} \n'.format(candidate_var, L))
display(objeto.table)

Resultado del autogrouping de bmi: [-12345670.0, 23.75, 30.75, 32.05, 36.45] 



Unnamed: 0,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV
0,Missing,147,0.041096,118,29,0.197279,-1.569969,0.207222
1,"(-inf, 23.75)",910,0.254403,893,17,0.018681,0.988016,0.16274
2,"[23.75, 30.75)",1306,0.36511,1239,67,0.051302,-0.05599,0.001174
3,"[30.75, 32.05)",199,0.055633,181,18,0.090452,-0.665232,0.033435
4,"[32.05, 36.45)",501,0.140062,489,12,0.023952,0.734098,0.05486
5,"[36.45, inf)",514,0.143696,483,31,0.060311,-0.227328,0.008235
Totals,,3577,1.0,3403,174,0.048644,,0.467667


<span style='color:blue'>Para mantener a los missing en un grupo aparte se debe dejar el valor `-1234567.0` como primer corte y un `0 en missing_group`

In [16]:
bp = {'breakpoints': [-12345670.0, 20, 30, 40], 'missing_group': 0}

vector = pyk.remapeo_missing(pyk.data_convert(pyk.string_categories2(bp)).fit(modelo_newgroups.X_train[candidate_var]).x_final, bp)
breakpoints_num = pyk.breakpoints_to_num(bp['breakpoints'])
groups_names = pyk.compute_group_names(objeto.dtype, bp['breakpoints'], bp['missing_group'])
                                      
pyk.compute_table(vector, modelo_newgroups.y_train, breakpoints_num, groups_names)

Unnamed: 0,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV
0,Missing,147,0.041096,118,29,0.197279,-1.569969,0.207222
1,"(-inf, 20.00)",367,0.1026,364,3,0.008174,1.825184,0.163761
2,"[20.00, 30.00)",1718,0.480291,1642,76,0.044237,0.09958,0.004554
3,"[30.00, 40.00)",1057,0.295499,1007,50,0.047304,0.029351,0.000251
4,"[40.00, inf)",288,0.080514,272,16,0.055556,-0.140144,0.001685
Totals,,3577,1.0,3403,174,0.048644,,0.377474


<span style='color:blue'>Si se quieren juntar estos missings con otro grupo existente se debe eliminar el corte de `-12345567.0` e indicar el número del grupo en `missing_group`

In [17]:
bp = {'breakpoints': [20, 30], 'missing_group': 3}

vector = pyk.remapeo_missing(pyk.data_convert(pyk.string_categories2(bp)).fit(modelo_newgroups.X_train[candidate_var]).x_final, bp)
breakpoints_num = pyk.breakpoints_to_num(bp['breakpoints'])
groups_names = pyk.compute_group_names(objeto.dtype, bp['breakpoints'], bp['missing_group'])
                                      
pyk.compute_table(vector, modelo_newgroups.y_train, breakpoints_num, groups_names)

Unnamed: 0,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV
0,"(-inf, 20.00)",367,0.1026,364,3,0.008174,1.825184,0.163761
1,"[20.00, 30.00)",1718,0.480291,1642,76,0.044237,0.09958,0.004554
2,"[30.00, inf), Missing",1492,0.417109,1397,95,0.063673,-0.285152,0.038626
Totals,,3577,1.0,3403,174,0.048644,,0.206941


<span style='color:blue'>Vamos a lanzar otra scorecard con esta agrupación en el `bmi`. Dado que esta es, a todas luces, peor que la automática debería salir una scorecard con menos Gini

In [18]:
modelo4 = pyk.autoscorecard(
    features=['age', 'bmi', 'avg_glucose_level'],
    user_breakpoints={
        'age': [30, 60],
        'bmi': {'breakpoints': [20, 30], 'missing_group': 2}
    }
).fit(X, y)

Particionado 70-30 estratificado en el target terminado.
------------------------------------------------------------------------------------------------------------------------------------------------------
Autogrouping terminado. Máximo número de buckets = 5. Mínimo porcentaje por bucket = 0.05
------------------------------------------------------------------------------------------------------------------------------------------------------
Step 01 | Time - 0:00:00.000000 | p-value = 8.28e-27 | Gini train = 56.54% | Gini test = 54.72% ---> Feature selected: age
Step 02 | Time - 0:00:00.000000 | p-value = 4.08e-01 | Gini train = 58.28% | Gini test = 55.94% ---> Feature selected: bmi
Step 03 | Time - 0:00:00.000000 | p-value = 2.17e-08 | Gini train = 63.89% | Gini test = 58.96% ---> Feature selected: avg_glucose_level
------------------------------------------------------------------------------------------------------------------------------------------------------
Selección termina

<span style='color:blue'>Vemos como quedaría la scorecard

In [19]:
pyk.pretty_scorecard(modelo4, color1='red')

Unnamed: 0,Variable,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV,Raw score,Aligned score
0,age,"(-inf, 30.00)",1077,0.30109,1075,2,0.001857,3.313571,1.008663,-3.001183,282
1,age,"[30.00, 60.00)",1538,0.429969,1494,44,0.028609,0.551665,0.102693,-0.499657,210
2,age,"[60.00, inf)",962,0.26894,834,128,0.133056,-1.099154,0.539195,0.995531,167
3,bmi,"(-inf, 20.00)",367,0.1026,364,3,0.008174,1.825184,0.163761,-0.507091,210
4,bmi,"[20.00, 30.00), Missing",1865,0.521387,1760,105,0.0563,-0.154249,0.013305,0.042855,194
5,bmi,"[30.00, inf)",1345,0.376013,1279,66,0.049071,-0.009178,3.2e-05,0.00255,195
6,avg_glucose_level,"(-inf, 72.72)",645,0.180319,624,21,0.032558,0.418271,0.026216,-0.254962,203
7,avg_glucose_level,"[72.72, 76.48)",201,0.056192,185,16,0.079602,-0.52559,0.019757,0.320381,186
8,avg_glucose_level,"[76.48, 165.21)",2278,0.636847,2205,73,0.032046,0.434666,0.099285,-0.264957,203
9,avg_glucose_level,"[165.21, 213.28)",240,0.067095,203,37,0.154167,-1.271069,0.194461,0.774797,173


<span style='color:red'>¿Y si la variable que tuviera missings fuese de tipo `texto`?!
- Entonces sería mucho más fácil: en una variable categórica el missing se trata como una categoría más, no hay distinción con el resto

<span style='color:blue'>Vemos un ejemplo añadiendo como feature la variable de tipo texto `work_type`

In [20]:
modelo5 = pyk.autoscorecard(
    features=['age', 'bmi', 'avg_glucose_level', 'work_type'],
    user_breakpoints={
        'age': [30, 60],
        'bmi': {'breakpoints': [20, 30], 'missing_group': 2}
    }).fit(X, y)

Particionado 70-30 estratificado en el target terminado.
------------------------------------------------------------------------------------------------------------------------------------------------------
Autogrouping terminado. Máximo número de buckets = 5. Mínimo porcentaje por bucket = 0.05
------------------------------------------------------------------------------------------------------------------------------------------------------
Step 01 | Time - 0:00:00.000000 | p-value = 8.28e-27 | Gini train = 56.54% | Gini test = 54.72% ---> Feature selected: age
Step 02 | Time - 0:00:00.000000 | p-value = 4.08e-01 | Gini train = 58.28% | Gini test = 55.94% ---> Feature selected: bmi
Step 03 | Time - 0:00:00.000000 | p-value = 2.17e-08 | Gini train = 63.89% | Gini test = 58.96% ---> Feature selected: avg_glucose_level
Step 04 | Time - 0:00:00.000000 | p-value = 3.11e-01 | Gini train = 63.45% | Gini test = 57.89% ---> Feature selected: work_type
---------------------------------------

In [21]:
pyk.pretty_scorecard(modelo5, color1='yellow')

Unnamed: 0,Variable,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV,Raw score,Aligned score
0,age,"(-inf, 30.00)",1077,0.30109,1075,2,0.001857,3.313571,1.008663,-3.120752,236
1,age,"[30.00, 60.00)",1538,0.429969,1494,44,0.028609,0.551665,0.102693,-0.519563,161
2,age,"[60.00, inf)",962,0.26894,834,128,0.133056,-1.099154,0.539195,1.035194,117
3,bmi,"(-inf, 20.00)",367,0.1026,364,3,0.008174,1.825184,0.163761,-0.63297,165
4,bmi,"[20.00, 30.00), Missing",1865,0.521387,1760,105,0.0563,-0.154249,0.013305,0.053493,145
5,bmi,"[30.00, inf)",1345,0.376013,1279,66,0.049071,-0.009178,3.2e-05,0.003183,146
6,avg_glucose_level,"(-inf, 72.72)",645,0.180319,624,21,0.032558,0.418271,0.026216,-0.253962,154
7,avg_glucose_level,"[72.72, 76.48)",201,0.056192,185,16,0.079602,-0.52559,0.019757,0.319124,137
8,avg_glucose_level,"[76.48, 165.21)",2278,0.636847,2205,73,0.032046,0.434666,0.099285,-0.263917,154
9,avg_glucose_level,"[165.21, 213.28)",240,0.067095,203,37,0.154167,-1.271069,0.194461,0.771759,124


<span style='color:blue'>La agrupación se hace igual que al principio, independientemente de que la variable tenga missings o no

In [22]:
candidate_var, modelo_newgroups = 'work_type', modelo5
objeto = modelo_newgroups.objetos[candidate_var]

In [23]:
if objeto.dtype != 'O': L = [round(i, 4) for  i in list(objeto.breakpoints)]
else: L = objeto.breakpoints
print('Resultado del autogrouping de {}: {} \n'.format(candidate_var, L))
display(objeto.table)

Resultado del autogrouping de work_type: [['Missing'], ['Private'], ['Govt_job'], ['children', 'Never_worked']] 



Unnamed: 0,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV
0,[Missing],572,0.159911,530,42,0.073427,-0.43815,0.037521
1,[Private],2025,0.566117,1915,110,0.054321,-0.116365,0.008081
2,[Govt_job],467,0.130556,447,20,0.042827,0.133469,0.002191
3,"[children, Never_worked]",513,0.143416,511,2,0.003899,2.569865,0.356356
Totals,,3577,1.0,3403,174,0.048644,,0.404149


In [24]:
bp = [['Private', 'Govt_job'], ['Missing', 'Never_worked'], ['children']]

vector = pyk.data_convert(pyk.string_categories2(bp)).fit(modelo_newgroups.X_train[candidate_var]).x_final
breakpoints_num = pyk.breakpoints_to_num(bp)
groups_names = pyk.compute_group_names(objeto.dtype, bp)
                                      
pyk.compute_table(vector, modelo_newgroups.y_train, breakpoints_num, groups_names)

Unnamed: 0,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV
0,"[Private, Govt_job]",2492,0.696673,2362,130,0.052167,-0.073628,0.003905
1,"[Missing, Never_worked]",590,0.164943,548,42,0.071186,-0.404752,0.03252
2,[children],495,0.138384,493,2,0.00404,2.534005,0.33798
Totals,,3577,1.0,3403,174,0.048644,,0.374405


<span style='color:blue'>Lanzamos la última scorecard con la variable de tipo texto reagrupada

In [25]:
modelo6 = pyk.autoscorecard(
    features=['age', 'bmi', 'avg_glucose_level', 'work_type'],
    user_breakpoints={
        'age': [30, 60],
        'bmi': {'breakpoints': [20, 30], 'missing_group': 2},
        'work_type': [['Private', 'Govt_job'], ['Missing', 'Never_worked'], ['children']]
    }).fit(X, y)

Particionado 70-30 estratificado en el target terminado.
------------------------------------------------------------------------------------------------------------------------------------------------------
Autogrouping terminado. Máximo número de buckets = 5. Mínimo porcentaje por bucket = 0.05
------------------------------------------------------------------------------------------------------------------------------------------------------
Step 01 | Time - 0:00:00.000000 | p-value = 8.28e-27 | Gini train = 56.54% | Gini test = 54.72% ---> Feature selected: age
Step 02 | Time - 0:00:00.000000 | p-value = 4.08e-01 | Gini train = 58.28% | Gini test = 55.94% ---> Feature selected: bmi
Step 03 | Time - 0:00:00.000000 | p-value = 2.17e-08 | Gini train = 63.89% | Gini test = 58.96% ---> Feature selected: avg_glucose_level
Step 04 | Time - 0:00:00.000000 | p-value = 7.67e-02 | Gini train = 63.96% | Gini test = 58.05% ---> Feature selected: work_type
---------------------------------------

In [26]:
pyk.pretty_scorecard(modelo6, color1='pink')

Unnamed: 0,Variable,Group,Count,Percent,Goods,Bads,Bad rate,WoE,IV,Raw score,Aligned score
0,age,"(-inf, 30.00)",1077,0.30109,1075,2,0.001857,3.313571,1.008663,-3.230911,240
1,age,"[30.00, 60.00)",1538,0.429969,1494,44,0.028609,0.551665,0.102693,-0.537903,162
2,age,"[60.00, inf)",962,0.26894,834,128,0.133056,-1.099154,0.539195,1.071735,115
3,bmi,"(-inf, 20.00)",367,0.1026,364,3,0.008174,1.825184,0.163761,-0.762649,168
4,bmi,"[20.00, 30.00), Missing",1865,0.521387,1760,105,0.0563,-0.154249,0.013305,0.064452,145
5,bmi,"[30.00, inf)",1345,0.376013,1279,66,0.049071,-0.009178,3.2e-05,0.003835,146
6,avg_glucose_level,"(-inf, 72.72)",645,0.180319,624,21,0.032558,0.418271,0.026216,-0.253092,154
7,avg_glucose_level,"[72.72, 76.48)",201,0.056192,185,16,0.079602,-0.52559,0.019757,0.31803,137
8,avg_glucose_level,"[76.48, 165.21)",2278,0.636847,2205,73,0.032046,0.434666,0.099285,-0.263012,154
9,avg_glucose_level,"[165.21, 213.28)",240,0.067095,203,37,0.154167,-1.271069,0.194461,0.769113,124
