# Drifting y Quality

Un desafío común para los **data scientists** en el mundo real es que, a diferencia de las competencia de **Kaggle**, los datos cambian con el tiempo. Esto significa que los datos que recibiremos en el futuro no necesariamente reflejarán los que usamos para entrenar nuestros modelos. Esta homogeneidad es esencial, ya que los algoritmos aprenden patrones en función de las distribuciones observadas durante el entrenamiento. Si esas distribuciones cambian, la capacidad predictiva del modelo se deteriora.

Podemos identificar dos tipos principales de problemáticas asociadas a estos cambios:

- **Data Quality (DQ)**
- **Data Drift (DF)**

Comencemos con el primero: **Data Quality (DQ)**. Hay mucho que decir sobre este tema, pero mencionaremos brevemente que los datos provienen de procesos creados por seres humanos, quienes son propensos a cometer errores y hacer modificaciones. A lo largo del ciclo de vida de los modelos, es común que se enfrenten a este tipo de situaciones. Por ello, es fundamental implementar controles que garanticen que los datos a los que se aplican los modelos mantienen la misma estructura que los datos usados en el entrenamiento.

A continuación, veremos dos indicadores clave para monitorear la calidad de los datos, aunque no son los únicos a considerar.


In [1]:
import duckdb
import pandas as pd
import numpy as np
import seaborn as sns
import os
import polars as ps
import matplotlib.pyplot as plt

In [2]:
base_path = '/home/fedepicado/'
modelos_path = base_path + 'buckets/b1/modelos/'
db_path = base_path + 'buckets/b1/db/'
dataset_path = base_path + 'buckets/b1/datasets/'
exp_path = base_path + 'buckets/b1/exp/'
dataset_file = 'competencia_02.parquet'
full_path = dataset_path + dataset_file

In [3]:
# Verifica si el archivo existe
if os.path.exists(full_path):
    print("Archivo encontrado:", full_path)
else:
    print("Archivo no encontrado:", full_path)

Archivo encontrado: /home/fedepicado/buckets/b1/datasets/competencia_02.parquet


In [4]:
# Leo el archivo
data = pd.read_parquet(dataset_path + dataset_file)

In [5]:
df=data.copy()

In [6]:
df=df.sort_values(by=['foto_mes', 'numero_de_cliente'])

In [8]:
df.groupby("foto_mes")['clase_ternaria'].value_counts().reset_index().head(60)# tengo problemas con el primer mes

Unnamed: 0,foto_mes,clase_ternaria,count
0,201901,CONTINUA,1678
1,201901,BAJA+1,9
2,201901,BAJA+2,3
3,201902,CONTINUA,124391
4,201902,BAJA+1,720
5,201902,BAJA+2,688
6,201903,CONTINUA,124988
7,201903,BAJA+2,760
8,201903,BAJA+1,688
9,201904,CONTINUA,125658


## NaN por foto_mes para todas las variables

In [9]:
def calcular_porcentaje_nan_por_mes(dataset):
    # Crear una lista de las variables que quiero analizar los NaN
    variables = dataset.columns.difference(['numero_de_cliente', 'foto_mes', 'clase_ternaria'])

    # Inicializar una lista para almacenar los resultados
    resultados = []

    # Agrupar el dataset por 'foto_mes' y calcular el ratio de NaN para cada columna
    for mes, datos_mes in dataset.groupby('foto_mes'):
        # Calcular el porcentaje de NaN para cada columna en este mes
        porcentaje_nan = datos_mes[variables].isna().mean() * 100
        # Crear un DataFrame temporal para almacenar los resultados
        resultado_mes = pd.DataFrame({
            'foto_mes': mes,
            'campo': porcentaje_nan.index,
            'porcentaje_nan': porcentaje_nan.values
        })
        # Añadir los resultados del mes a la lista
        resultados.append(resultado_mes)

    # Concatenar todos los resultados en un único DataFrame
    tabla_resultados_nan = pd.concat(resultados, ignore_index=True)

    return tabla_resultados_nan

In [10]:
pd.options.display.max_columns = 152

In [11]:
# Llamada a la función
tabla_porcentajes_nan_por_mes = calcular_porcentaje_nan_por_mes(df)
tabla_pivoteada_nan = tabla_porcentajes_nan_por_mes.pivot(index='campo', columns='foto_mes', values='porcentaje_nan')
tabla_pivoteada_nan['total_nan'] = tabla_pivoteada_nan.sum(axis=1)/6

# Ordenar la tabla por la columna 'total_nan' en orden descendente
tabla_ordenada_nan = tabla_pivoteada_nan.sort_values(by='total_nan', ascending=False)
tabla_ordenada_nan.iloc[0:50].style.background_gradient(cmap='YlOrRd', axis=1)

foto_mes,201901,201902,201903,201904,201905,201906,201907,201908,201909,201910,201911,201912,202001,202002,202003,202004,202005,202006,202007,202008,202009,202010,202011,202012,202101,202102,202103,202104,202105,202106,202107,202108,total_nan
campo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
Master_Finiciomora,99.512633,99.349756,99.355405,99.534631,99.477514,99.492205,99.659588,99.477628,99.594234,99.588827,99.387742,99.517279,99.592959,99.394327,99.466376,99.655039,99.384508,99.726842,99.695693,99.633257,99.82762,99.434675,99.562031,99.726979,99.631541,99.604663,99.310261,99.372905,99.493388,99.59606,99.5489,99.634313,530.873296
Visa_Finiciomora,99.137489,98.949912,94.908096,99.294466,99.196296,95.861007,99.472935,99.145209,95.62741,99.310809,98.976685,99.24997,99.337344,99.144172,99.140979,99.536271,96.761888,99.531729,99.576924,97.152008,99.800469,99.122274,97.283846,99.702834,97.367706,99.39439,99.036564,99.140106,97.336338,99.390451,99.409029,97.495195,525.6318
Master_cconsumos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_cadelantosefectivo,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mconsumosdolares,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mconsumospesos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_madelantopesos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_madelantodolares,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mpagosdolares,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mpagospesos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159


In [12]:
lista_meses=[201901,201902,201903,201904,201905,201906]
lista_columnas=["cmobile_app_trx","tmobile_app"]

In [13]:
def asignar_cero(df, lista_meses, lista_columnas):
    for mes in lista_meses:
        filtro_mes = df['foto_mes'] == mes
        for columna in lista_columnas:
            df.loc[filtro_mes, columna] = df.loc[filtro_mes, columna].fillna(0)
    return df

In [14]:
df=asignar_cero(df,lista_meses,lista_columnas)

In [17]:
df["internet"].value_counts()

internet
1    2595829
0    2139764
Name: count, dtype: int64

In [16]:
df.loc[df["internet"] > 1, "internet"] = 1

In [18]:
#Elimino las dos que tienen muchos NaN y 2 con drift
col_eliminar_nan=["Master_Finiciomora","Visa_Finiciomora","Visa_fultimo_cierre","Master_fultimo_cierre", "cprestamos_personales","mprestamos_personales"]
df.drop(columns=col_eliminar_nan, axis=1,inplace=True)

In [19]:
# Llamada a la función
tabla_porcentajes_nan_por_mes = calcular_porcentaje_nan_por_mes(df)
tabla_pivoteada_nan = tabla_porcentajes_nan_por_mes.pivot(index='campo', columns='foto_mes', values='porcentaje_nan')
tabla_pivoteada_nan['total_nan'] = tabla_pivoteada_nan.sum(axis=1)/6

# Ordenar la tabla por la columna 'total_nan' en orden descendente
tabla_ordenada_nan = tabla_pivoteada_nan.sort_values(by='total_nan', ascending=False)
tabla_ordenada_nan.iloc[0:50].style.background_gradient(cmap='YlOrRd', axis=1)

foto_mes,201901,201902,201903,201904,201905,201906,201907,201908,201909,201910,201911,201912,202001,202002,202003,202004,202005,202006,202007,202008,202009,202010,202011,202012,202101,202102,202103,202104,202105,202106,202107,202108,total_nan
campo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
Master_cadelantosefectivo,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_cconsumos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mconsumospesos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_madelantodolares,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mconsumosdolares,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_madelantopesos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mpagosdolares,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mconsumototal,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Master_mpagospesos,56.03838,56.582326,57.085798,57.158493,57.411542,58.343009,58.169885,58.783091,59.666155,59.378704,59.675337,60.267594,60.65321,61.445595,62.09928,62.014919,63.30647,63.385732,62.864333,62.53677,62.18689,62.32541,61.924077,61.05952,60.88529,60.002705,59.844824,59.780608,59.299733,58.459691,57.903628,58.356403,319.8159
Visa_mpagospesos,12.534468,12.795014,13.008162,13.026395,13.130292,13.598997,13.22481,13.433185,13.65606,13.460441,13.561987,13.713112,13.787283,13.823084,14.174188,14.533735,15.313266,15.368406,15.146632,14.88431,14.472978,14.638361,14.507098,14.151282,14.053917,13.745189,13.733085,13.576086,13.449518,13.081952,12.39222,12.844381,73.469982


In [20]:
df.shape

(4735593, 149)

## Sigo aca

In [21]:
def calcular_porcentaje_ceros_por_mes(dataset):
    # Crear una lista de las variables que quiero analizar los ceros
    variables = dataset.columns.difference(['numero_de_cliente', 'foto_mes', 'clase_ternaria'])

    # Inicializar una lista para almacenar los resultados
    resultados = []

    # Agrupar el dataset por 'foto_mes' y calcular el ratio de ceros para cada columna
    for mes, datos_mes in dataset.groupby('foto_mes'):
        # Calcular el porcentaje de ceros para cada columna en este mes
        porcentaje_ceros = (datos_mes[variables] == 0).mean() * 100
        # Crear un DataFrame temporal para almacenar los resultados
        resultado_mes = pd.DataFrame({
            'foto_mes': mes,
            'campo': porcentaje_ceros.index,
            'porcentaje_ceros': porcentaje_ceros.values
        })
        # Añadir los resultados del mes a la lista
        resultados.append(resultado_mes)

    # Concatenar todos los resultados en un único DataFrame
    tabla_resultados = pd.concat(resultados, ignore_index=True)

    return tabla_resultados

In [22]:
# Llamada a la función
tabla_porcentajes_ceros_por_mes = calcular_porcentaje_ceros_por_mes(df)
tabla_pivoteada = tabla_porcentajes_ceros_por_mes.pivot(index='campo', columns='foto_mes', values='porcentaje_ceros')
tabla_pivoteada['total_zero'] = tabla_pivoteada.sum(axis=1)/6

# Ordenar la tabla por la columna 'total_zero' en orden descendente
tabla_pivoteada_zero = tabla_pivoteada.sort_values(by='total_zero', ascending=False)


In [23]:
tabla_pivoteada_zero.iloc[0:50].style.background_gradient(cmap='YlOrRd', axis=1)

foto_mes,201901,201902,201903,201904,201905,201906,201907,201908,201909,201910,201911,201912,202001,202002,202003,202004,202005,202006,202007,202008,202009,202010,202011,202012,202101,202102,202103,202104,202105,202106,202107,202108,total_zero
campo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
mcuenta_corriente_adicional,99.920643,99.919713,99.919327,99.92047,99.92245,99.92414,99.924268,99.928391,99.926292,99.929032,99.926442,99.928907,99.933318,99.934742,99.935055,99.933944,99.93455,99.935613,99.9358,99.935693,99.937489,99.939273,99.937789,99.93809,99.937047,99.937902,99.937074,99.937839,99.939255,99.940561,99.941872,99.941974,532.972492
mpayroll2,99.978357,99.677263,99.686798,99.656682,99.680399,99.695787,99.687892,99.67738,99.746117,99.737347,99.756251,99.660887,99.729797,99.760042,99.859396,99.895911,99.894223,99.905695,99.910762,99.887303,99.893289,99.891067,99.906061,99.852036,99.875946,99.88933,99.904084,99.909806,99.907668,99.871419,99.929156,99.935929,532.375013
cpayroll2_trx,99.978357,99.677263,99.686798,99.656682,99.680399,99.695787,99.687892,99.67738,99.746117,99.735883,99.756251,99.660887,99.729797,99.760042,99.859396,99.895911,99.894223,99.905695,99.910762,99.887303,99.893289,99.891067,99.906061,99.852036,99.875946,99.88933,99.904084,99.909806,99.907668,99.871419,99.929156,99.935929,532.374769
ccheques_emitidos_rechazados,99.621649,99.725753,99.686798,99.244071,99.452448,99.676436,99.651173,99.701502,99.691023,99.675158,99.71226,99.741222,99.735354,99.753244,99.676612,99.405493,99.617879,100.0,99.777869,99.812171,99.802363,99.825331,99.824564,99.818605,99.777196,99.822313,99.829551,99.834847,99.831737,99.833815,99.840753,99.795094,531.865714
mcheques_emitidos_rechazados,99.621649,99.725753,99.686798,99.244071,99.452448,99.676436,99.651173,99.701502,99.691023,99.675158,99.71226,99.741222,99.735354,99.753244,99.676612,99.405493,99.617879,100.0,99.777869,99.812171,99.802363,99.825331,99.824564,99.818605,99.777196,99.822313,99.829551,99.834847,99.831737,99.833815,99.840753,99.795094,531.865714
mcheques_depositados_rechazados,99.525459,99.611285,99.600588,99.181864,99.243297,99.573483,99.518834,99.6216,99.589767,99.561756,99.668991,99.675816,99.683259,99.702941,99.433568,99.113911,99.584162,100.0,99.811253,99.815992,99.809309,99.824705,99.815854,99.8087,99.781517,99.835225,99.829551,99.803157,99.811691,99.793178,99.821377,99.840427,531.548753
ccheques_depositados_rechazados,99.525459,99.611285,99.600588,99.181864,99.243297,99.573483,99.518834,99.6216,99.589767,99.561756,99.668991,99.675816,99.683259,99.702941,99.433568,99.113911,99.584162,100.0,99.811253,99.815992,99.809309,99.824705,99.815854,99.8087,99.781517,99.835225,99.829551,99.803157,99.811691,99.793178,99.821377,99.840427,531.548753
mplazo_fijo_pesos,99.345101,99.351346,99.323769,99.373209,99.372547,99.375319,99.370429,99.430139,99.571899,99.612239,99.631491,99.620364,99.63672,99.617291,99.641795,99.622344,99.623168,99.598717,99.580776,99.566402,99.565577,99.638142,99.660325,99.634734,99.63833,99.626797,99.610838,99.608142,99.599691,99.600306,99.574937,99.553922,530.929468
cliente_vip,99.250513,99.255161,99.254959,99.260607,99.264447,99.272367,99.270218,99.270337,99.2726,99.282276,99.29255,99.302579,99.657558,99.664195,99.667238,99.667716,99.670107,99.674812,99.677718,99.677826,99.677971,99.680713,99.677744,99.676213,99.678447,99.728859,99.729358,99.728198,99.727863,99.434727,99.722074,99.717726,530.797613
minversion1_dolares,99.026869,99.01907,99.012939,99.002331,98.999679,98.961188,98.862489,99.125611,99.256965,99.321783,99.345194,99.426991,99.460984,99.482697,99.521278,99.530933,99.537885,99.55059,99.560232,99.568949,99.586414,99.597448,99.604335,99.611208,99.620431,99.626797,99.631609,99.636175,99.64039,99.646401,99.652441,99.656677,530.180831


In [36]:
tabla_pivoteada_zero.iloc[50:100].style.background_gradient(cmap='YlOrRd', axis=1)

foto_mes,201901,201902,201903,201904,201905,201906,201907,201908,201909,201910,201911,201912,202001,202002,202003,202004,202005,202006,202007,202008,202009,202010,202011,202012,202101,202102,202103,202104,202105,202106,202107,202108,total_zero
campo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
Visa_cadelantosefectivo,86.647909,86.406092,86.225442,86.068065,86.044071,85.591318,86.036994,85.822831,85.595694,85.764036,85.738496,85.684731,85.63897,85.631742,85.308257,85.149327,84.438818,84.366891,84.667189,84.910033,85.287711,85.134382,85.242189,85.620891,85.736857,86.003345,86.028653,86.181973,86.322081,86.658458,87.377688,86.919283,457.37507
mforex_sell,86.211043,89.850476,89.718909,87.484645,88.473981,85.659437,83.698479,85.839414,82.882648,74.866478,83.598116,74.698033,94.608449,95.967616,95.055438,85.147993,73.854463,68.992631,64.132277,63.134638,68.735438,87.707458,89.380498,90.681376,92.093244,94.125893,95.777866,96.319703,96.321292,94.737257,92.030978,92.252874,457.33984
cforex_sell,86.211043,89.850476,89.718909,87.484645,88.473981,85.659437,83.698479,85.839414,82.882648,74.866478,83.598116,74.698033,94.608449,95.967616,95.055438,85.147993,73.854463,68.992631,64.132277,63.134638,68.735438,87.707458,89.380498,90.681376,92.093244,94.125893,95.777866,96.319703,96.321292,94.737257,92.030978,92.252874,457.33984
cseguro_vida,78.515775,78.581706,78.661141,78.829254,79.006572,79.205951,79.291484,79.778237,79.976771,80.252703,86.065899,86.213663,86.310657,86.435228,86.518118,86.608573,86.695844,86.693939,86.594463,86.603675,86.552462,86.44784,88.521357,88.551688,88.504931,88.445458,88.429606,88.428302,88.446936,88.416143,88.442768,88.435222,452.410395
Master_msaldodolares,80.656022,81.162807,81.408776,81.151375,81.798385,81.796015,81.481595,82.327534,82.824575,82.391244,83.1373,83.228471,83.960796,84.644719,85.136854,85.36351,85.583197,86.364848,86.103978,86.32225,86.129405,86.114154,86.185938,85.62151,85.9467,85.810902,86.244922,86.201475,86.186013,85.892428,86.69347,86.067625,449.989799
cforex,81.136976,84.036439,83.363124,81.971086,82.901323,81.30912,79.751232,80.002111,78.72597,71.704394,79.670001,72.758618,93.040718,94.767825,94.020997,84.382673,73.502092,68.692157,63.868416,62.79209,68.386889,87.368138,89.043312,90.373067,91.811808,93.828314,95.309283,95.936376,96.035183,94.444916,91.839033,91.993569,446.461209
cseguro_accidentes_personales,80.739387,80.820992,80.978519,81.084444,81.252399,81.453873,81.519078,81.634053,81.780753,81.964706,82.137062,82.297154,82.409041,82.626488,82.813546,83.008834,83.341377,83.707408,83.913485,84.283513,84.631025,84.910255,85.120255,85.311343,85.449249,85.631986,85.863091,86.060089,86.255262,86.429195,86.593562,86.771195,445.465436
mttarjeta_master_debitos_automaticos,81.314929,81.568216,81.610459,81.304136,81.987952,81.898193,81.500719,81.940089,82.79405,81.829356,82.750041,83.313783,82.61534,83.081253,83.015078,83.083565,84.083802,83.695051,85.686038,84.794152,83.841107,84.353695,84.880118,84.051484,84.086505,83.567994,85.929682,83.974648,84.88668,83.776292,84.918742,84.209572,444.390454
ctarjeta_master_debitos_automaticos,81.314929,81.568216,81.610459,81.304136,81.987952,81.898193,81.500719,81.940089,82.79405,81.829356,82.750041,83.313783,82.61534,83.081253,83.015078,83.083565,84.083802,83.695051,85.686038,84.794152,83.841107,84.353695,84.880118,84.051484,84.086505,83.567994,85.929682,83.974648,84.88668,83.776292,84.918742,84.209572,444.390454
Master_mpagado,81.856804,82.049142,80.833782,81.991559,82.27152,80.73398,82.25957,82.48206,81.364564,83.057023,83.201483,82.33981,84.014281,84.282403,84.031442,85.236068,83.386332,85.347009,85.578824,83.915496,85.485979,85.127496,83.869182,60.32713,83.631022,84.327927,85.550295,85.612164,83.573377,85.142167,85.758574,83.107675,441.95769


In [24]:
tabla_pivoteada_zero.iloc[100:153].style.background_gradient(cmap='YlOrRd', axis=1)

foto_mes,201901,201902,201903,201904,201905,201906,201907,201908,201909,201910,201911,201912,202001,202002,202003,202004,202005,202006,202007,202008,202009,202010,202011,202012,202101,202102,202103,202104,202105,202106,202107,202108,total_zero
campo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
mautoservicio,35.809446,35.859586,35.875067,35.902706,35.815728,36.161813,34.736544,35.210758,34.944235,34.056423,34.657849,32.930237,33.632941,34.150188,35.022363,39.134728,37.966826,100.0,35.112735,34.514638,33.843949,32.38069,31.912008,29.453463,30.30131,30.629711,29.163332,29.783046,30.100897,29.163129,28.214614,28.139167,189.096687
ctarjeta_debito_transacciones,35.809446,35.859586,35.875067,35.902706,35.815728,36.161813,34.736544,35.210758,34.944235,34.056423,34.657849,32.930237,33.632941,34.120278,35.022363,39.134728,37.966826,100.0,35.112735,34.514638,33.843949,32.38069,31.912008,29.453463,30.30131,30.629711,29.163332,29.783046,30.100897,29.163129,28.214614,28.139167,189.091702
chomebanking_transacciones,40.335225,32.998672,36.176405,41.082396,47.760832,48.236651,43.809094,44.237321,44.97074,100.0,19.574953,19.164516,20.157537,20.817897,20.771847,18.511797,16.983228,100.0,16.043502,16.216939,16.247924,16.342476,17.314703,16.779342,17.296607,16.976747,15.635519,16.004022,15.785765,16.109076,15.519643,15.439248,157.216771
thomebanking,20.885437,22.095565,21.691607,22.421179,21.168112,22.274085,22.428934,22.46804,23.456974,23.027904,24.285519,23.736501,25.221927,26.498719,26.289536,26.704788,21.734618,100.0,21.958219,22.307683,22.53506,24.373478,26.304886,26.523284,26.724106,27.008964,25.076824,26.305686,26.244206,27.385429,26.943664,26.346393,142.071221
mtarjeta_visa_consumo,14.378126,14.768003,14.942738,14.778418,14.999334,15.284164,14.711912,15.174426,15.369954,15.040752,15.206213,15.197532,15.305697,16.585661,15.953828,16.371304,16.627551,100.0,16.672017,16.327726,15.952416,16.00566,15.993953,15.512054,15.514794,15.102124,15.385649,15.100859,15.058649,14.487251,14.493921,14.387519,96.115034
ctarjeta_visa_transacciones,14.378126,14.768003,14.942738,14.778418,14.999334,15.284164,14.711912,15.174426,15.369954,15.040752,15.206213,15.197532,15.305697,16.585661,15.953828,16.371304,16.627551,100.0,16.672017,16.327726,15.952416,16.00566,15.993953,15.512054,15.514794,15.102124,15.385649,15.100859,15.058649,14.487251,14.493921,14.387519,96.115034
Visa_mpagominimo,14.604976,14.396776,14.115442,14.394942,14.527765,16.770393,14.77005,14.546524,14.851021,14.655917,13.767515,15.427162,15.081339,16.052043,16.271861,16.510089,19.511969,20.687839,18.567833,17.660355,16.827576,15.795932,15.318958,17.048649,14.153901,14.601035,13.69032,13.365226,13.212006,13.165652,12.923852,12.079762,81.559113
mactivos_margen,3.621585,3.930079,3.997279,4.108003,100.0,3.912963,4.479667,4.486522,4.394925,100.0,4.687489,4.523642,4.716391,4.889572,5.029594,5.724218,5.429027,100.0,6.063018,5.403736,5.724533,5.45417,4.775976,5.035103,4.484465,4.844263,4.488499,4.101408,4.064438,4.37177,4.354776,3.901065,72.499696
mcomisiones,2.306175,2.619258,2.655889,2.667801,100.0,2.939173,2.668982,2.813876,3.471716,100.0,2.848551,2.728546,2.876374,3.250651,3.299499,3.402904,3.35645,100.0,3.242726,3.37837,3.178612,3.210397,3.067027,2.847839,2.656364,2.720018,2.328252,2.566275,2.668521,2.514617,2.908835,2.960554,64.025709
mcomisiones_otras,2.302969,2.619258,2.655889,2.667801,100.0,2.939173,2.668982,2.813876,3.471716,100.0,2.848551,2.728546,2.876374,3.250651,3.299499,3.402904,3.35645,100.0,3.242726,3.37837,3.178612,3.183477,3.043386,2.828028,2.635997,2.69358,2.305037,2.535804,2.634504,2.478226,2.908835,2.960554,63.984962


In [25]:
# Sample function to interpolate zeros with values from adjacent months
def imputar_valores(df, columnas, meses_a_imputar):
    # Sort the DataFrame by client and month to ensure correct interpolation
    df = df.sort_values(['numero_de_cliente', 'foto_mes'])
    
    for col in columnas:
        # Replace zeros in the specified months with NaNs temporarily
        df.loc[(df['foto_mes'].isin(meses_a_imputar)) & (df[col] == 0), col] = None
        
        # Apply interpolation per client group, filling the NaNs created above
        df[col] = df.groupby('numero_de_cliente')[col].transform(
            lambda x: x.interpolate(method='linear', limit_direction='both')
        )
        
    return df

In [26]:
col_var_impu=["mactivos_margen","mcomisiones","mcomisiones_otras","ccomisiones_otras","mpasivos_margen","mrentabilidad","mrentabilidad_annual"]
meses=[201905, 201910]
df = imputar_valores(df, col_var_impu, meses)

In [27]:
dataset_file1 = 'competencia_02_DQ.parquet'
full_path1 = dataset_path + dataset_file1

# Guarda el DataFrame en formato Parquet
df.to_parquet(full_path1, engine='pyarrow') 

## Cuando entreno con mas meses, sin agregar ft me da menor ganancia que entrenando con solo dos meses, ya me empieza a jugar la inflacion... 18/11/2024

In [28]:
meses=[201901, 201902, 201903, 201904, 201905, 201906, 201907, 
       201908, 201909, 201910, 201911, 201912, 202001, 202002,
       202003, 202004, 202005, 202006, 202007, 202008, 202009,
       202010, 202011, 202012, 202101, 202102, 202103, 202104, 
       202105, 202106, 202107, 202108] #202109, 202110,202111,202112

IPC = [
    189.6, 196.8, 206.0, 213.1, 219.6, 225.5, 230.5, 239.6, 253.7, 262.1, 
    273.2, 283.4, 289.8, 295.7, 305.6, 310.1, 314.9, 322.0, 328.2, 337.1, 
    346.6, 359.7, 371.0, 385.9, 401.5, 415.9, 435.9, 453.7, 468.7, 483.6, 
    498.1, 510.4] #528.5, 547.1, 560.9, 582.5


In [4]:
dataset_file = 'competencia_02_DQ.parquet'
full_path = dataset_path + dataset_file

In [5]:
df = pd.read_parquet(dataset_path + dataset_file)

In [29]:
df.shape

(4735593, 149)

In [30]:
df.head(5)

Unnamed: 0,numero_de_cliente,foto_mes,active_quarter,cliente_vip,internet,cliente_edad,cliente_antiguedad,mrentabilidad,mrentabilidad_annual,mcomisiones,mactivos_margen,mpasivos_margen,cproductos,tcuentas,ccuenta_corriente,mcuenta_corriente_adicional,mcuenta_corriente,ccaja_ahorro,mcaja_ahorro,mcaja_ahorro_adicional,mcaja_ahorro_dolares,cdescubierto_preacordado,mcuentas_saldo,ctarjeta_debito,ctarjeta_debito_transacciones,mautoservicio,ctarjeta_visa,ctarjeta_visa_transacciones,mtarjeta_visa_consumo,ctarjeta_master,ctarjeta_master_transacciones,mtarjeta_master_consumo,cprestamos_prendarios,mprestamos_prendarios,cprestamos_hipotecarios,mprestamos_hipotecarios,cplazo_fijo,mplazo_fijo_dolares,mplazo_fijo_pesos,cinversion1,minversion1_pesos,minversion1_dolares,cinversion2,minversion2,cseguro_vida,cseguro_auto,cseguro_vivienda,cseguro_accidentes_personales,ccaja_seguridad,cpayroll_trx,mpayroll,mpayroll2,cpayroll2_trx,ccuenta_debitos_automaticos,mcuenta_debitos_automaticos,ctarjeta_visa_debitos_automaticos,mttarjeta_visa_debitos_automaticos,ctarjeta_master_debitos_automaticos,mttarjeta_master_debitos_automaticos,cpagodeservicios,mpagodeservicios,cpagomiscuentas,mpagomiscuentas,ccajeros_propios_descuentos,mcajeros_propios_descuentos,ctarjeta_visa_descuentos,mtarjeta_visa_descuentos,ctarjeta_master_descuentos,mtarjeta_master_descuentos,ccomisiones_mantenimiento,mcomisiones_mantenimiento,ccomisiones_otras,mcomisiones_otras,cforex,cforex_buy,mforex_buy,cforex_sell,mforex_sell,ctransferencias_recibidas,mtransferencias_recibidas,ctransferencias_emitidas,mtransferencias_emitidas,cextraccion_autoservicio,mextraccion_autoservicio,ccheques_depositados,mcheques_depositados,ccheques_emitidos,mcheques_emitidos,ccheques_depositados_rechazados,mcheques_depositados_rechazados,ccheques_emitidos_rechazados,mcheques_emitidos_rechazados,tcallcenter,ccallcenter_transacciones,thomebanking,chomebanking_transacciones,ccajas_transacciones,ccajas_consultas,ccajas_depositos,ccajas_extracciones,ccajas_otras,catm_trx,matm,catm_trx_other,matm_other,ctrx_quarter,tmobile_app,cmobile_app_trx,Master_delinquency,Master_status,Master_mfinanciacion_limite,Master_Fvencimiento,Master_msaldototal,Master_msaldopesos,Master_msaldodolares,Master_mconsumospesos,Master_mconsumosdolares,Master_mlimitecompra,Master_madelantopesos,Master_madelantodolares,Master_mpagado,Master_mpagospesos,Master_mpagosdolares,Master_fechaalta,Master_mconsumototal,Master_cconsumos,Master_cadelantosefectivo,Master_mpagominimo,Visa_delinquency,Visa_status,Visa_mfinanciacion_limite,Visa_Fvencimiento,Visa_msaldototal,Visa_msaldopesos,Visa_msaldodolares,Visa_mconsumospesos,Visa_mconsumosdolares,Visa_mlimitecompra,Visa_madelantopesos,Visa_madelantodolares,Visa_mpagado,Visa_mpagospesos,Visa_mpagosdolares,Visa_fechaalta,Visa_mconsumototal,Visa_cconsumos,Visa_cadelantosefectivo,Visa_mpagominimo,clase_ternaria
928934,249221109,201901,1,0,1,59,276,7597.55,47433.58,5654.59,-915.57,2571.97,9,1,1,0.0,-317.98,2,24068.63,0.0,268675.63,1,280992.59,3,0,0.0,1,23,31249.45,1,5,9333.88,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.0,0.0,0,5,15333.25,6,19820.48,4,8786.59,0,0.0,6,14612.56,0,0.0,0,0.0,0,0.0,1,1199.97,18.0,5654.59,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,5161.2,0,0.0,0,0.0,0,0.0,0,0,1,71,1,1,4,2,0,0,0.0,0,0.0,164,0.0,0.0,0.0,0.0,209004.84,-1307.0,9333.88,9333.88,0.0,8786.59,0.0,232227.6,0.0,0.0,8815.09,-9635.71,0.0,6755.0,8786.59,4.0,0.0,398.82,0.0,0.0,274901.32,-1276.0,26663.38,31176.66,99.48,24336.99,4.28,305484.53,0.0,0.0,0.0,-44919.57,3.23,7136.0,24336.99,13.0,0.0,1466.25,CONTINUA
928935,249221109,201902,1,0,1,59,277,3800.93,46605.81,2104.71,-600.05,2046.1,9,1,1,0.0,0.0,2,23798.48,0.0,282828.56,1,315387.19,3,0,0.0,1,17,24360.67,1,5,9977.93,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.0,0.0,0,4,10504.27,7,18915.28,4,9430.63,0,0.0,5,11583.57,0,0.0,0,0.0,0,0.0,1,1199.97,14.0,2104.71,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,3519.0,0,0.0,0,0.0,0,0.0,0,0,1,77,1,0,0,0,1,0,0.0,0,0.0,147,0.0,0.0,0.0,0.0,211831.48,-1279.0,9975.79,9975.79,0.0,9430.63,0.0,235368.31,0.0,0.0,9975.79,-8815.09,0.0,6783.0,9430.63,4.0,0.0,422.28,0.0,0.0,289474.39,-1248.0,20793.03,23848.15,542.06,19467.22,7.56,321678.87,0.0,0.0,0.0,-31306.77,0.65,7164.0,19467.22,10.0,0.0,1149.54,CONTINUA
928936,249221109,201903,1,0,1,59,278,4102.67,45348.8,2492.53,-866.41,2239.08,9,1,1,0.0,0.0,2,8537.18,0.0,313961.39,1,317386.02,3,0,0.0,1,13,26983.49,1,8,15997.16,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.0,0.0,0,4,10501.92,6,20722.17,4,9430.63,0,0.0,4,20095.17,0,0.0,0,0.0,0,0.0,1,1199.97,14.0,2492.53,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0,1,52,1,1,0,0,2,0,0.0,0,0.0,122,0.0,0.0,0.0,0.0,225964.66,-1248.0,15997.16,15997.16,0.0,14362.86,0.0,251071.85,0.0,0.0,16152.21,-9975.79,0.0,6814.0,14362.86,6.0,0.0,668.61,0.0,0.0,323854.32,-1217.0,22896.25,27037.43,-180.13,23263.59,6.63,359883.62,0.0,0.0,0.0,-24634.77,18.02,7195.0,23263.59,9.0,0.0,1102.62,CONTINUA
928937,249221109,201904,1,0,1,60,279,4374.99,44872.11,3231.46,-2417.52,3392.4,9,1,1,0.0,-45.54,2,9237.04,0.0,131386.0,1,75396.13,3,0,0.0,1,10,108263.43,1,6,9815.28,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.0,0.0,0,4,10501.92,0,0.0,4,8181.01,0,0.0,11,34767.41,0,0.0,0,0.0,0,0.0,1,1199.97,16.0,3231.46,3,3,5571.75,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,1,2,1,11,1,0,0,2,0,0,0.0,0,0.0,120,0.0,0.0,0.0,0.0,253368.0,-1218.0,9660.24,9660.24,0.0,8181.01,0.0,281520.0,0.0,0.0,9660.24,-16152.21,0.0,6844.0,8181.01,4.0,0.0,269.79,0.0,0.0,354715.2,-1187.0,92244.21,108202.45,0.0,107024.32,6.06,394128.0,0.0,0.0,0.0,-27036.47,0.0,7225.0,107024.32,9.0,0.0,4316.64,CONTINUA
928938,249221109,201905,1,0,1,60,280,3930.055,45187.545,2789.96,-1651.875,2623.825,9,1,1,0.0,0.0,2,24671.08,0.0,84998.34,1,44975.52,3,1,5395.8,1,8,22260.09,1,8,193922.1,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.0,0.0,0,4,10501.92,5,21210.84,4,8181.01,0,0.0,8,33454.14,0,0.0,0,0.0,0,0.0,1,1199.97,15.0,2789.96,1,1,2521.95,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0,1,8,1,1,1,0,1,0,0.0,0,0.0,126,0.0,0.0,0.0,0.0,253368.0,-1187.0,193922.1,193922.1,0.0,192287.83,0.0,281520.0,0.0,0.0,194295.72,-9660.24,0.0,6875.0,192287.83,6.0,0.0,7788.72,0.0,0.0,354715.2,-1156.0,19023.05,22314.03,0.0,21493.82,5.86,394128.0,0.0,0.0,0.0,-108202.45,0.0,7256.0,21493.82,8.0,0.0,973.59,CONTINUA


In [31]:
## columnas monetarias
var_monetarias=["mrentabilidad","mrentabilidad_annual","mcomisiones","mactivos_margen","mpasivos_margen","mcuenta_corriente",
               "mcaja_ahorro","mcaja_ahorro_adicional","mcaja_ahorro_dolares","mcuentas_saldo","mautoservicio","mtarjeta_visa_consumo",
               "mtarjeta_master_consumo","mtarjeta_master_consumo","mprestamos_prendarios","mprestamos_hipotecarios","mplazo_fijo_dolares",
               "mplazo_fijo_pesos","minversion1_pesos","minversion1_dolares","minversion2","mpayroll","mpayroll2","mcuenta_debitos_automaticos",
               "mttarjeta_visa_debitos_automaticos","mttarjeta_master_debitos_automaticos","mpagomiscuentas","mcajeros_propios_descuentos",
               "mtarjeta_visa_descuentos","mtarjeta_master_descuentos","mcomisiones_mantenimiento","mcomisiones_otras","mforex_buy",
               "mtransferencias_recibidas","mtransferencias_emitidas","mextraccion_autoservicio","mcheques_depositados","mcheques_emitidos",
               "mcheques_depositados_rechazados","mcheques_emitidos_rechazados","matm","matm_other","Master_mfinanciacion_limite",
               "Master_msaldototal","Master_msaldopesos","Master_msaldodolares","Master_mconsumospesos","Master_mconsumosdolares",
               "Master_mlimitecompra","Master_madelantopesos","Master_madelantodolares","Master_mpagado","Master_mpagospesos","Master_mpagosdolares",
               "Master_mconsumototal","Master_mpagominimo","Visa_mfinanciacion_limite","Visa_msaldototal", "Visa_msaldopesos",
                "Visa_msaldodolares", "Visa_mconsumospesos", "Visa_mconsumosdolares", "Visa_mlimitecompra", "Visa_madelantopesos",
                "Visa_madelantodolares","Visa_mpagado", "Visa_mpagospesos", "Visa_mpagosdolares","Visa_mconsumototal", "Visa_mpagominimo"]

In [32]:
def ajustar_por_inflacion(df, columnas_monetarias, meses, ipc, ipc_base_mes=201901):
    """
    Ajusta variables monetarias por inflación utilizando un IPC base y listas de meses e IPC.
    
    Parámetros:
    -----------
    - df: pd.DataFrame
        DataFrame con las variables monetarias y la columna `foto_mes`.
    - columnas_monetarias: list
        Lista de columnas monetarias a ajustar.
    - meses: list
        Lista de valores `foto_mes` en orden cronológico (e.g., 201901, 201902, ...).
    - ipc: list
        Lista de valores IPC correspondientes a los meses.
    - ipc_base_mes: int
        `foto_mes` que se utilizará como base para ajustar los valores (default: 201901).

    Retorna:
    --------
    - pd.DataFrame:
        DataFrame original con las columnas ajustadas por inflación.
    """
    # Crear diccionario de mapeo entre meses e IPC
    mapeo_ipc = dict(zip(meses, ipc))
    
    # Obtener el IPC base
    ipc_base = mapeo_ipc[ipc_base_mes]
    
    # Agregar columna temporal con el IPC correspondiente al `foto_mes`
    df['IPC'] = df['foto_mes'].map(mapeo_ipc)
    
    # Ajustar las columnas monetarias por inflación
    for col in columnas_monetarias:
        df[col] = df[col] * ipc_base / df['IPC']
    
    # Eliminar columna temporal IPC
    df.drop(columns=['IPC'], inplace=True)
    
    return df

In [33]:
df=ajustar_por_inflacion(df, var_monetarias, meses, IPC, ipc_base_mes=201901)

In [34]:
df

Unnamed: 0,numero_de_cliente,foto_mes,active_quarter,cliente_vip,internet,cliente_edad,cliente_antiguedad,mrentabilidad,mrentabilidad_annual,mcomisiones,mactivos_margen,mpasivos_margen,cproductos,tcuentas,ccuenta_corriente,mcuenta_corriente_adicional,mcuenta_corriente,ccaja_ahorro,mcaja_ahorro,mcaja_ahorro_adicional,mcaja_ahorro_dolares,cdescubierto_preacordado,mcuentas_saldo,ctarjeta_debito,ctarjeta_debito_transacciones,mautoservicio,ctarjeta_visa,ctarjeta_visa_transacciones,mtarjeta_visa_consumo,ctarjeta_master,ctarjeta_master_transacciones,mtarjeta_master_consumo,cprestamos_prendarios,mprestamos_prendarios,cprestamos_hipotecarios,mprestamos_hipotecarios,cplazo_fijo,mplazo_fijo_dolares,mplazo_fijo_pesos,cinversion1,minversion1_pesos,minversion1_dolares,cinversion2,minversion2,cseguro_vida,cseguro_auto,cseguro_vivienda,cseguro_accidentes_personales,ccaja_seguridad,cpayroll_trx,mpayroll,mpayroll2,cpayroll2_trx,ccuenta_debitos_automaticos,mcuenta_debitos_automaticos,ctarjeta_visa_debitos_automaticos,mttarjeta_visa_debitos_automaticos,ctarjeta_master_debitos_automaticos,mttarjeta_master_debitos_automaticos,cpagodeservicios,mpagodeservicios,cpagomiscuentas,mpagomiscuentas,ccajeros_propios_descuentos,mcajeros_propios_descuentos,ctarjeta_visa_descuentos,mtarjeta_visa_descuentos,ctarjeta_master_descuentos,mtarjeta_master_descuentos,ccomisiones_mantenimiento,mcomisiones_mantenimiento,ccomisiones_otras,mcomisiones_otras,cforex,cforex_buy,mforex_buy,cforex_sell,mforex_sell,ctransferencias_recibidas,mtransferencias_recibidas,ctransferencias_emitidas,mtransferencias_emitidas,cextraccion_autoservicio,mextraccion_autoservicio,ccheques_depositados,mcheques_depositados,ccheques_emitidos,mcheques_emitidos,ccheques_depositados_rechazados,mcheques_depositados_rechazados,ccheques_emitidos_rechazados,mcheques_emitidos_rechazados,tcallcenter,ccallcenter_transacciones,thomebanking,chomebanking_transacciones,ccajas_transacciones,ccajas_consultas,ccajas_depositos,ccajas_extracciones,ccajas_otras,catm_trx,matm,catm_trx_other,matm_other,ctrx_quarter,tmobile_app,cmobile_app_trx,Master_delinquency,Master_status,Master_mfinanciacion_limite,Master_Fvencimiento,Master_msaldototal,Master_msaldopesos,Master_msaldodolares,Master_mconsumospesos,Master_mconsumosdolares,Master_mlimitecompra,Master_madelantopesos,Master_madelantodolares,Master_mpagado,Master_mpagospesos,Master_mpagosdolares,Master_fechaalta,Master_mconsumototal,Master_cconsumos,Master_cadelantosefectivo,Master_mpagominimo,Visa_delinquency,Visa_status,Visa_mfinanciacion_limite,Visa_Fvencimiento,Visa_msaldototal,Visa_msaldopesos,Visa_msaldodolares,Visa_mconsumospesos,Visa_mconsumosdolares,Visa_mlimitecompra,Visa_madelantopesos,Visa_madelantodolares,Visa_mpagado,Visa_mpagospesos,Visa_mpagosdolares,Visa_fechaalta,Visa_mconsumototal,Visa_cconsumos,Visa_cadelantosefectivo,Visa_mpagominimo,clase_ternaria
928934,249221109,201901,1,0,1,59,276,7597.550000,47433.580000,5654.590000,-915.570000,2571.970000,9,1,1,0.0,-317.980000,2,24068.630000,0.0,268675.630000,1,280992.590000,3,0,0.000000,1,23,31249.450000,1,5,9333.880000,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.000000,0.0,0,5,15333.250000,6,19820.480000,4,8786.590000,0,0.0,6,14612.560000,0,0.0,0,0.0,0,0.0,1,1199.970000,18.0,5654.590000,0,0,0.000000,0,0.00,0,0.000000,0,0.0,0,0.0,1,5161.200000,0,0.0,0,0.0,0,0.0,0,0,1,71,1,1,4,2,0,0,0.0,0,0.0,164,0.0,0.0,0.0,0.0,209004.840000,-1307.0,9333.880000,9333.880000,0.0,8786.590000,0.0,232227.600000,0.0,0.0,8815.090000,-9635.710000,0.0,6755.0,8786.590000,4.0,0.0,398.820000,0.0,0.0,274901.320000,-1276.0,26663.380000,31176.660000,99.480000,24336.990000,4.280000,305484.530000,0.0,0.0,0.0,-44919.570000,3.230000,7136.0,24336.990000,13.0,0.0,1466.250000,CONTINUA
928935,249221109,201902,1,0,1,59,277,3661.871585,44900.719390,2027.708415,-578.096951,1971.242683,9,1,1,0.0,0.000000,2,22927.803902,0.0,272481.173659,1,303848.634268,3,0,0.000000,1,17,23469.425976,1,5,9261.192910,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.000000,0.0,0,4,10119.967439,7,18223.257561,4,9085.606951,0,0.0,5,11159.780854,0,0.0,0,0.0,0,0.0,1,1156.068659,14.0,2027.708415,0,0,0.000000,0,0.00,0,0.000000,0,0.0,0,0.0,1,3390.256098,0,0.0,0,0.0,0,0.0,0,0,1,77,1,0,0,0,1,0,0.0,0,0.0,147,0.0,0.0,0.0,0.0,204081.547805,-1279.0,9610.822073,9610.822073,0.0,9085.606951,0.0,226757.274268,0.0,0.0,9610.822073,-8492.586707,0.0,6783.0,9085.606951,4.0,0.0,406.830732,0.0,0.0,278883.863537,-1248.0,20032.309390,22975.656707,522.228537,18755.004634,7.283415,309910.130854,0.0,0.0,0.0,-30161.400366,0.626220,7164.0,18755.004634,10.0,0.0,1107.483659,CONTINUA
928936,249221109,201903,1,0,1,59,278,3776.049670,41738.507184,2294.095573,-797.433670,2060.823146,9,1,1,0.0,0.000000,2,7857.521010,0.0,288966.405553,1,292118.395107,3,0,0.000000,1,13,24835.289825,1,8,13551.429617,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.000000,0.0,0,4,9665.844816,6,19072.443845,4,8679.841981,0,0.0,4,18495.360350,0,0.0,0,0.0,0,0.0,1,1104.438408,14.0,2294.095573,0,0,0.000000,0,0.00,0,0.000000,0,0.0,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0,1,52,1,1,0,0,2,0,0.0,0,0.0,122,0.0,0.0,0.0,0.0,207975.240466,-1248.0,14723.599689,14723.599689,0.0,13219.409010,0.0,231083.605631,0.0,0.0,14866.305903,-9181.600893,0.0,6814.0,13219.409010,6.0,0.0,615.380854,0.0,0.0,298071.743068,-1217.0,21073.441748,24884.935573,-165.789553,21411.537204,6.102175,331232.691029,0.0,0.0,0.0,-22673.555301,16.585398,7195.0,21411.537204,9.0,0.0,1014.838602,CONTINUA
928937,249221109,201904,1,0,1,60,279,3892.529817,39923.754369,2875.104721,-2150.923473,3018.296762,9,1,1,0.0,-40.517992,2,8218.408184,0.0,116897.163773,1,67081.681126,3,0,0.000000,1,10,96324.478311,1,6,7769.846871,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.000000,0.0,0,4,9343.801183,0,0.000000,4,7278.833862,0,0.0,11,30933.369010,0,0.0,0,0.0,0,0.0,1,1067.641070,16.0,2875.104721,3,3,4957.314876,0,0.00,0,0.000000,0,0.0,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,1,2,1,11,1,0,0,2,0,0,0.0,0,0.0,120,0.0,0.0,0.0,0.0,225427.371187,-1218.0,8594.939015,8594.939015,0.0,7278.833862,0.0,250474.856875,0.0,0.0,8594.939015,-14370.994913,0.0,6844.0,7278.833862,4.0,0.0,240.038405,0.0,0.0,315598.319662,-1187.0,82071.807677,96270.222994,0.000000,95222.013477,5.391722,350664.799625,0.0,0.0,0.0,-24054.972839,0.000000,7225.0,95222.013477,9.0,0.0,3840.614472,CONTINUA
928938,249221109,201905,1,0,1,60,280,3393.162240,39014.383115,2408.817923,-1426.209016,2265.378962,9,1,1,0.0,0.000000,2,21300.713880,0.0,73386.544918,1,38831.323279,3,1,4658.668852,1,8,19219.094098,1,8,144557.057673,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,1,1,0,0,0.000000,0.0,0,4,9067.231475,5,18313.184262,4,7063.385683,0,0.0,8,28883.902295,0,0.0,0,0.0,0,0.0,1,1036.039672,15.0,2408.817923,1,1,2177.421311,0,0.00,0,0.000000,0,0.0,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0,1,8,1,1,1,0,1,0,0.0,0,0.0,126,0.0,0.0,0.0,0.0,218754.885246,-1187.0,167430.009836,167430.009836,0.0,166019.000765,0.0,243060.983607,0.0,0.0,167752.588852,-8340.535082,0.0,6875.0,166019.000765,6.0,0.0,6724.687213,0.0,0.0,306256.839344,-1156.0,16424.272678,19265.665246,0.000000,18557.505792,5.059454,340285.377049,0.0,0.0,0.0,-93420.694536,0.000000,7256.0,18557.505792,8.0,0.0,840.585902,CONTINUA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1881302,1603590310,202108,0,0,0,58,1,0.000000,0.000000,0.000000,0.000000,0.000000,6,2,1,0.0,0.000000,2,0.000000,0.0,0.000000,1,0.000000,1,0,0.000000,1,0,0.000000,1,0,0.000000,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,0,0,0,0,0.000000,0.0,0,0,0.000000,0,0.000000,0,0.000000,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0.000000,0.0,0.000000,0,0,0.000000,0,0.00,0,0.000000,0,0.0,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0.0,0,0.0,2,0.0,0.0,0.0,0.0,75295.568652,-2190.0,0.000000,0.000000,0.0,,,83661.742947,,,0.000000,,,5.0,,,,0.000000,0.0,0.0,75295.568652,-2190.0,0.000000,0.000000,0.000000,,,83661.742947,,,0.0,,,5.0,,,,0.000000,
865538,1603703854,202108,0,0,0,45,1,0.742947,0.742947,0.000000,0.000000,0.635219,6,1,1,0.0,46.252147,2,0.000000,0.0,0.000000,1,1429.630063,1,0,0.000000,1,0,0.000000,1,0,0.000000,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,0,0,0,0,0.000000,0.0,0,0,0.000000,0,0.000000,0,0.000000,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0.000000,0.0,0.000000,0,0,0.000000,0,0.00,0,0.000000,0,0.0,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0,1,32,0,0,0,0,0,0,0.0,0,0.0,6,0.0,0.0,0.0,0.0,200800.928323,-2190.0,0.000000,0.000000,0.0,,,227914.363542,,,0.000000,,,14.0,,,,0.000000,0.0,0.0,200800.928323,-2190.0,0.000000,0.000000,0.000000,,,227914.363542,,,0.0,,,14.0,,,,0.000000,
4735352,1603775178,202108,1,0,0,33,1,38.603511,38.603511,10.442116,-0.133730,24.142053,6,1,1,0.0,-5.794984,2,1763.658903,0.0,411.711348,1,52256.787508,2,0,0.000000,1,0,0.000000,1,0,0.000000,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,0,0,0,1,49215.761755,0.0,0,0,0.000000,0,0.000000,0,0.000000,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0.000000,3.0,10.442116,2,0,0.000000,2,58.65,3,54008.882445,0,0.0,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0,1,2,0,0,0,0,0,0,0.0,0,0.0,13,0.0,1.0,0.0,0.0,112943.352978,-2190.0,0.000000,0.000000,0.0,,,125492.614420,,,0.000000,,,7.0,,,,0.000000,0.0,0.0,112943.352978,-2190.0,0.000000,0.000000,0.000000,,,125492.614420,,,0.0,,,7.0,,,,0.000000,
4444491,1603805076,202108,0,0,0,36,1,0.000000,0.000000,0.000000,0.000000,0.000000,7,2,1,0.0,0.000000,2,0.000000,0.0,0.000000,1,0.000000,1,0,0.000000,1,0,0.000000,1,0,0.000000,0,0.0,0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0,0,0,0,0,0,0.000000,0.0,0,0,0.000000,0,0.000000,0,0.000000,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0.000000,0.0,0.000000,0,0,0.000000,0,0.00,0,0.000000,0,0.0,0,0.0,0,0.000000,0,0.0,0,0.0,0,0.0,0,0,1,1,0,0,0,0,0,0,0.0,0,0.0,2,0.0,0.0,0.0,0.0,150591.137304,-2190.0,0.000000,0.000000,0.0,,,167323.485893,,,0.000000,,,14.0,,,,0.000000,0.0,0.0,150591.137304,-2190.0,0.000000,0.000000,0.000000,,,167323.485893,,,0.0,,,14.0,,,,0.000000,


In [35]:
dataset_file1 = 'competencia_02_DQ_IPC.parquet'
full_path1 = dataset_path + dataset_file1

# Guarda el DataFrame en formato Parquet
df.to_parquet(full_path1, engine='pyarrow') 

In [43]:
aux = df[df['foto_mes'] == 202108]

In [44]:
aux.head(50)

Unnamed: 0,numero_de_cliente,foto_mes,active_quarter,cliente_vip,internet,cliente_edad,cliente_antiguedad,mrentabilidad,mrentabilidad_annual,mcomisiones,...,Visa_madelantodolares,Visa_mpagado,Visa_mpagospesos,Visa_mpagosdolares,Visa_fechaalta,Visa_mconsumototal,Visa_cconsumos,Visa_cadelantosefectivo,Visa_mpagominimo,clase_ternaria
31,249221109,202108,1,0,0,62,307,,,,...,,,,,8079.0,,13.0,0.0,,
63,249221468,202108,1,0,0,53,44,,,,...,,,,,1336.0,,26.0,0.0,,
95,249223005,202108,1,0,0,49,209,,,,...,,,,,3698.0,,4.0,0.0,,
127,249228180,202108,1,0,0,66,327,,,,...,,,,,9137.0,,6.0,0.0,,
159,249232117,202108,1,0,0,80,380,,,,...,,,,,8819.0,,7.0,0.0,,
191,249236712,202108,1,0,0,45,32,,,,...,,,,,953.0,,0.0,0.0,,
221,249236857,202108,1,0,0,73,327,,,,...,,,,,9671.0,,2.0,0.0,,
253,249237079,202108,1,0,0,64,148,,,,...,,,,,4474.0,,3.0,0.0,,
285,249237446,202108,1,0,0,54,284,,,,...,,,,,4165.0,,2.0,0.0,,
317,249239632,202108,1,0,0,52,270,,,,...,,,,,5997.0,,4.0,0.0,,


In [45]:
asd=calcular_porcentaje_nan_por_mes(aux)

In [46]:
tabla_pivoteada_nan = asd.pivot(index='campo', columns='foto_mes', values='porcentaje_nan')
tabla_pivoteada_nan['total_nan'] = tabla_pivoteada_nan.sum(axis=1)/6

# Ordenar la tabla por la columna 'total_nan' en orden descendente
tabla_ordenada_nan = tabla_pivoteada_nan.sort_values(by='total_nan', ascending=False)
tabla_ordenada_nan.iloc[0:50].style.background_gradient(cmap='YlOrRd', axis=1)

foto_mes,202108,total_nan
campo,Unnamed: 1_level_1,Unnamed: 2_level_1
Master_mconsumospesos,100.0,16.666667
Master_madelantodolares,100.0,16.666667
Master_mpagosdolares,100.0,16.666667
Master_mpagospesos,100.0,16.666667
Master_mconsumosdolares,100.0,16.666667
Master_madelantopesos,100.0,16.666667
Master_mconsumototal,100.0,16.666667
Master_mpagado,100.0,16.666667
Master_mlimitecompra,100.0,16.666667
Master_mfinanciacion_limite,100.0,16.666667
