# 1.1.0 Build inputs from credit reports dataset 

### - Methodology
The goal of this section is to construct informative and actionable features from the credit reports dataset that encapsulate each customer's credit history effectively. This involves a meticulous aggregation and transformation of credit-related data

Here are some features we explored, overall and by credit time: 
- Raw Features:
    - Total Loans Count (credit_reports__loans_count): Captures the total number of loans associated with each customer, providing a direct measure of credit usage.
    - Maximum Credit Used (credit_reports__max_credit_max): Represents the peak credit amount utilized by the customer, indicating their highest financial leverage or needs.
- Derived Features:
    - Credit Utilization Ratios (credit_reports__debt_ratio): Calculated as the ratio of current balance to credit limit, this metric helps in assessing how much of the available credit is being utilized by the customer.
    - Delayed Payment Indicators (credit_reports__has_delayed_payments): Reflects whether there have been any payments past their due date, which is a critical indicator of potential default risk.
    - Diversity in Credit Types (credit_reports__credit_type_nunique): The count of unique types of credit, which illustrates the variety of credit facilities used by the customer.
    - Age of Credit (credit_reports__age): Measures the duration from the opening to the closing of the credit or to the current date if it's still active, providing insights into the longevity of credit relationships.


In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
from pathlib import Path
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

### 1. Load Data

In [2]:
DATA_PATH = Path.cwd().parent / "data"
CREDIT_REPORT_DATA_PATH = DATA_PATH / "raw_data/credit_reports.parquet"
MAIN_DATASET_PATH = DATA_PATH / "raw_data/main_dataset.parquet"
df = pd.read_parquet(CREDIT_REPORT_DATA_PATH)
main_df = pd.read_parquet(MAIN_DATASET_PATH)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287356 entries, 0 to 287355
Data columns (total 29 columns):
 #   Column                   Non-Null Count   Dtype              
---  ------                   --------------   -----              
 0   customer_id              287356 non-null  int64              
 1   INQUIRY_TIME             287356 non-null  datetime64[us, UTC]
 2   CDC_INQUIRY_ID           287267 non-null  object             
 3   INQUIRY_DATE             287267 non-null  datetime64[us]     
 4   PREVENTION_KEY           287267 non-null  object             
 5   CURRENCY                 287267 non-null  object             
 6   MAX_CREDIT               287174 non-null  float64            
 7   CREDIT_LIMIT             278999 non-null  float64            
 8   PAYMENT_AMOUNT           287267 non-null  float64            
 9   UPDATE_DATE              287267 non-null  datetime64[us]     
 10  LOAN_OPENING_DATE        287267 non-null  datetime64[us]     
 11  LOAN_CLOSING_

In [3]:
df = pd.merge(df, main_df[["LOAN_ORIGINATION_DATETIME", "customer_id", "APPLICATION_DATETIME"]], how="left", on="customer_id")
df[df["LOAN_ORIGINATION_DATETIME"]<df["INQUIRY_DATE"]]

Unnamed: 0,customer_id,INQUIRY_TIME,CDC_INQUIRY_ID,INQUIRY_DATE,PREVENTION_KEY,CURRENCY,MAX_CREDIT,CREDIT_LIMIT,PAYMENT_AMOUNT,UPDATE_DATE,LOAN_OPENING_DATE,LOAN_CLOSING_DATE,WORST_DELAY_DATE,REPORT_DATE,LAST_PURCHASE_DATE,LAST_PAYMENT_DATE,PAYMENT_FREQUENCY,BUSINESS_TYPE,CREDIT_TYPE,ACCOUNT_TYPE,RESPONSABILITY_TYPE,TOTAL_PAYMENTS,DELAYED_PAYMENTS,CURRENT_PAYMENT,WORST_DELAY,TOTAL_REPORTED_PAYMENTS,CURRENT_BALANCE,BALANCE_DUE,BALANCE_DUE_WORST_DELAY,LOAN_ORIGINATION_DATETIME,APPLICATION_DATETIME


In [4]:
df.RESPONSABILITY_TYPE.unique()

array(['INDIVIDUAL (TITULAR)', 'TITULAR CON AVAL',
       'OBLIGATORIO SOLIDARIO', None, 'MANCOMUNADO', 'AVAL'], dtype=object)

In [5]:
df.PAYMENT_FREQUENCY.unique()

array(['MENSUAL', 'PAGO MINIMO PARA CUENTAS REVOLVENTES', 'SEMANAL',
       'CATORCENAL', 'QUINCENAL', 'UNA SOLA EXHIBICION', 'BIMESTRAL',
       None, 'ANUAL', 'TRIMESTRAL', 'DEDUCCION DEL SALARIO', 'SEMESTRAL'],
      dtype=object)

In [6]:
df.ACCOUNT_TYPE.value_counts()

ACCOUNT_TYPE
PAGOS FIJOS                       224372
REVOLVENTE                         26441
SIN LIMITE PREESTABLECIDO          18375
POR DETERMINAR                     16421
HIPOTECA                            1641
CREDITO DE HABILITACION O AVIO        17
Name: count, dtype: int64

In [7]:
def build_aggregate_credit_report_information_by(df: pd.DataFrame, aggregate_by: str) -> pd.DataFrame:
    
    """
    Aggregates credit report information by customer ID and a specified category, computing various 
    statistical measures for each group. This function creates a wide-format DataFrame where each row 
    represents a unique customer and columns represent aggregated metrics of credit-related activities 
    across different categories specified by 'aggregate_by'.

    Parameters:
    - df: The input DataFrame containing credit report data.
    - aggregate_by: The column name to further group the data (e.g., 'account_type').

    Returns:
    - pd.DataFrame: A pivot table where the index is 'customer_id', columns are created by the values of 
      'aggregate_by', and cells contain aggregated credit report metrics such as sums, medians, and 
      standard deviations of financial metrics. Each feature is prefixed with 'credit_reports__' to 
      denote its origin from credit report data.

    Examples of aggregated metrics include:
    - Count of inquiries
    - Sum, median, and standard deviation of maximum credit
    - Number of unique credit types
    - Maximum, median, and standard deviation of delayed payment severity
    """
    
    df_aggregates = df.groupby(["customer_id", aggregate_by]).agg({
        "cdc_inquiry_id": ["count"],
        "max_credit": ['sum', 'median', 'std'],
        "credit_limit": ['sum', 'median', 'std'],
        "current_balance": ['sum', 'median', 'std'],
        "balance_due_worst_delay": ['max', 'median', 'std'],
        "balance_due": ['sum', 'median', 'std'],
        "debt_ratio": ['max', 'median', 'std'],
        "credit_type": ["nunique"], 
        "business_type": ["nunique"],
        "age": ['max', 'median', 'std'],      
        "severity_delayed_payments":  ['max', 'median', 'std'],
        "balance_due_ratio":  ['max', 'median', 'std'],
        "balance_due_worst_delay_ratio":  ['max', 'median', 'std'],
        "has_delayed_payments":  ['sum'],
        "is_individual_responsibility":  ['sum'],
        "payment_amount": ['sum']
    })
    df_aggregates.columns = ["_".join(i) for i in df_aggregates.columns.values]
    df_aggregates = df_aggregates.reset_index()

    values = df_aggregates.columns.to_list()
    values.remove("customer_id")
    values.remove(aggregate_by)
    
    df_pivot = df_aggregates.pivot_table(
        index='customer_id',
        columns=aggregate_by,
        values=values,
        aggfunc='first'
    )
    
    features = ["credit_reports__" + "_".join(col).lower() for col in df_pivot.columns.values]
    df_pivot.columns = features

    return df_pivot.reset_index()


def build_aggregate_credit_report_information(df: pd.DataFrame, aggregate_column_names: str = "credit_reports__") -> pd.DataFrame:
    """
    Aggregates key financial indicators from a credit report dataset at the customer level. This function
    computes various statistical metrics such as count, sum, max, min, mean, median, and standard deviation
    for different financial variables to comprehensively summarize each customer's credit activities.

    Parameters:
    - df: The DataFrame containing credit report data with multiple entries per customer.
    - aggregate_column_names: A prefix for the column names in the aggregated DataFrame,
      helping to identify the source of the features. Defaults to 'credit_reports__'.

    Returns:
    - pd.DataFrame: A DataFrame where each row corresponds to a unique customer_id and columns represent
      aggregated metrics for various credit-related features. Column names are prefixed with the value 
      provided in `aggregate_column_names`, followed by the specific aggregation type (e.g., 'sum', 'max').

    Aggregates the following metrics for each customer:
    - Count of credit inquiries
    - Sum of maximum credit extended
    - Sum of credit limits across all accounts
    - Sum of current balances across accounts
    - Maximum and sum of balances due
    - Maximum, median, mean, and standard deviation of the debt ratio
    - Number of unique credit types and business types utilized by the customer
    - Maximum and minimum age of accounts
    - Maximum, median, mean, and standard deviation of severity of delayed payments
    - Aggregated metrics related to balance due ratios
    - Sum of instances where payments were delayed
    - Sum of instances denoting individual responsibility for the credit
    - Sum of payment amounts
    """
    
    df_aggregates = df.groupby(["customer_id"]).agg({
        "cdc_inquiry_id": ["count"],
        "max_credit": ["sum"],
        "credit_limit": ["sum"],
        "current_balance": ["sum"],
        "balance_due_worst_delay": ['max'],
        "balance_due": ['sum'],
        "debt_ratio": ['max', 'median', 'mean', 'std'],
        "credit_type": ["nunique"],  
        "business_type": ["nunique"], 
        "age": ['max', 'min'],     
        "severity_delayed_payments": ['max', 'median', 'mean', 'std'],
        "balance_due_ratio":  ['max', 'median', 'mean', 'std'],
        "balance_due_worst_delay_ratio":  ['max', 'median', 'mean', 'std'],
        "has_delayed_payments":  ['sum'],
        "is_individual_responsibility": ['sum'],
        "payment_amount": ['sum']
    })
    df_aggregates.columns = [aggregate_column_names + "_".join(i) for i in df_aggregates.columns.values]
    df_aggregates = df_aggregates.reset_index()

    return df_aggregates


def build_credit_report_features(df_aux: pd.DataFrame) -> pd.DataFrame:
    """
    Processes and enriches a DataFrame containing credit report data by adding derived features,
    aggregating data, and preparing the dataset for further analysis and modeling.

    This function handles:
    - Standardizing column names and data types.
    - Calculating various financial ratios and flags based on credit data.
    - Aggregating credit data at the customer level to provide a holistic view of their credit status.
    - Merging different aggregations to form a comprehensive feature set per customer.

    Parameters:
    - df_aux: The input DataFrame with raw credit report data.

    Returns:
    - pd.DataFrame: A DataFrame indexed by 'customer_id' with new features derived from credit report data,
      including ratios of credit use, payment behaviors, and aggregate metrics of credit activities.
    """
    
    df = df_aux.copy()
    df.columns = [i.lower() for i in df.columns]
    df["account_type"] = df["account_type"].str.replace(" ", "_")
    df = df.astype({"delayed_payments": "float"})
    df[["responsability_type", "credit_type", "business_type"]]
    
    df = df.assign(
        age = np.where(
            df["loan_opening_date"].isnull(), np.nan, np.where(
                df["loan_closing_date"].isnull(), (df["application_datetime"] - df["loan_opening_date"]).dt.days, (df["loan_closing_date"] - df["loan_opening_date"]).dt.days)),
        is_opening = np.where(
            df["loan_closing_date"].isnull(), 1, np.where(~df["loan_closing_date"].isnull(), 0, np.nan)),
        debt_ratio = (df["current_balance"] / df["max_credit"]).replace([np.inf, -np.inf], np.nan),
        severity_delayed_payments = (df["delayed_payments"] / df["total_payments"]).replace([np.inf, -np.inf], np.nan),
        balance_due_ratio = (df["balance_due"] / df["max_credit"]).replace([np.inf, -np.inf], np.nan),
        balance_due_worst_delay_ratio = (df["balance_due_worst_delay"] / df["max_credit"]).replace([np.inf, -np.inf], np.nan),
        has_delayed_payments = np.where(df["delayed_payments"]>0, 1, np.where(df["delayed_payments"]==0, 0, np.nan)),
        is_individual_responsibility = np.where(df["responsability_type"]=="INDIVIDUAL (TITULAR)", 1, np.where(~df["responsability_type"].isnull(), 0, np.nan))
    )

    
    agg_df = build_aggregate_credit_report_information(df).rename(columns={
        "credit_reports__cdc_inquiry_id_count": "credit_reports__loans_count",
        "credit_reports__is_opening_sum": "credit_reports__opening_loan_count",
        "credit_reports__has_delayed_payments_sum": "credit_reports__loans_with_at_least_one_delayed_count",
    })

    df_aux = df[df["is_opening"]==1]
    agg_df_open_loans = build_aggregate_credit_report_information(df_aux, aggregate_column_names="credit_reports__open_loans_").rename(columns={
        "credit_reports__open_loans_cdc_inquiry_id_count": "credit_reports__open_loans_count",
        "credit_reports__open_loans_is_opening_sum": "credit_reports__opening_loan_count",
        "credit_reports__open_loans_has_delayed_payments_sum": "credit_reports__open_loans_with_at_least_one_delayed_count",
    })
    
    agg_df_by_credit_type = build_aggregate_credit_report_information_by(df, aggregate_by="account_type")
    
    df_pivot = df[["customer_id"]].drop_duplicates()
    df_pivot = pd.merge(df_pivot, agg_df, how="left", on="customer_id")
    df_pivot = pd.merge(df_pivot, agg_df_open_loans, how="left", on="customer_id")
    df_pivot = pd.merge(df_pivot, agg_df_by_credit_type, how="left", on="customer_id")

    df_pivot = df_pivot.assign(
        credit_reports__opening_loans_ratio = df_pivot["credit_reports__open_loans_count"]/df_pivot["credit_reports__loans_count"],
        credit_reports__loans_with_at_least_one_delayed_ratio = df_pivot[ "credit_reports__loans_with_at_least_one_delayed_count"]/df_pivot["credit_reports__loans_count"],
        credit_reports__debt_ratio = df_pivot["credit_reports__balance_due_sum"]/df_pivot["credit_reports__max_credit_sum"],
        credit_reports__debt_due_ratio = df_pivot["credit_reports__balance_due_sum"]/df_pivot["credit_reports__balance_due_sum"]
    )

    return df_pivot
    

df_features = build_credit_report_features(df)
df_features.head(10)

Unnamed: 0,customer_id,credit_reports__loans_count,credit_reports__max_credit_sum,credit_reports__credit_limit_sum,credit_reports__current_balance_sum,credit_reports__balance_due_worst_delay_max,credit_reports__balance_due_sum,credit_reports__debt_ratio_max,credit_reports__debt_ratio_median,credit_reports__debt_ratio_mean,credit_reports__debt_ratio_std,credit_reports__credit_type_nunique,credit_reports__business_type_nunique,credit_reports__age_max,credit_reports__age_min,credit_reports__severity_delayed_payments_max,credit_reports__severity_delayed_payments_median,credit_reports__severity_delayed_payments_mean,credit_reports__severity_delayed_payments_std,credit_reports__balance_due_ratio_max,credit_reports__balance_due_ratio_median,credit_reports__balance_due_ratio_mean,credit_reports__balance_due_ratio_std,credit_reports__balance_due_worst_delay_ratio_max,credit_reports__balance_due_worst_delay_ratio_median,credit_reports__balance_due_worst_delay_ratio_mean,credit_reports__balance_due_worst_delay_ratio_std,credit_reports__loans_with_at_least_one_delayed_count,credit_reports__is_individual_responsibility_sum,credit_reports__payment_amount_sum,credit_reports__open_loans_count,credit_reports__open_loans_max_credit_sum,credit_reports__open_loans_credit_limit_sum,credit_reports__open_loans_current_balance_sum,credit_reports__open_loans_balance_due_worst_delay_max,credit_reports__open_loans_balance_due_sum,credit_reports__open_loans_debt_ratio_max,credit_reports__open_loans_debt_ratio_median,credit_reports__open_loans_debt_ratio_mean,credit_reports__open_loans_debt_ratio_std,credit_reports__open_loans_credit_type_nunique,credit_reports__open_loans_business_type_nunique,credit_reports__open_loans_age_max,credit_reports__open_loans_age_min,credit_reports__open_loans_severity_delayed_payments_max,credit_reports__open_loans_severity_delayed_payments_median,credit_reports__open_loans_severity_delayed_payments_mean,credit_reports__open_loans_severity_delayed_payments_std,credit_reports__open_loans_balance_due_ratio_max,credit_reports__open_loans_balance_due_ratio_median,credit_reports__open_loans_balance_due_ratio_mean,credit_reports__open_loans_balance_due_ratio_std,credit_reports__open_loans_balance_due_worst_delay_ratio_max,credit_reports__open_loans_balance_due_worst_delay_ratio_median,credit_reports__open_loans_balance_due_worst_delay_ratio_mean,credit_reports__open_loans_balance_due_worst_delay_ratio_std,credit_reports__open_loans_with_at_least_one_delayed_count,credit_reports__open_loans_is_individual_responsibility_sum,credit_reports__open_loans_payment_amount_sum,credit_reports__age_max_credito_de_habilitacion_o_avio,credit_reports__age_max_hipoteca,credit_reports__age_max_pagos_fijos,credit_reports__age_max_por_determinar,credit_reports__age_max_revolvente,credit_reports__age_max_sin_limite_preestablecido,credit_reports__age_median_credito_de_habilitacion_o_avio,credit_reports__age_median_hipoteca,credit_reports__age_median_pagos_fijos,credit_reports__age_median_por_determinar,credit_reports__age_median_revolvente,credit_reports__age_median_sin_limite_preestablecido,credit_reports__age_std_credito_de_habilitacion_o_avio,credit_reports__age_std_hipoteca,credit_reports__age_std_pagos_fijos,credit_reports__age_std_por_determinar,credit_reports__age_std_revolvente,credit_reports__age_std_sin_limite_preestablecido,credit_reports__balance_due_median_credito_de_habilitacion_o_avio,credit_reports__balance_due_median_hipoteca,credit_reports__balance_due_median_pagos_fijos,credit_reports__balance_due_median_por_determinar,credit_reports__balance_due_median_revolvente,credit_reports__balance_due_median_sin_limite_preestablecido,credit_reports__balance_due_ratio_max_credito_de_habilitacion_o_avio,credit_reports__balance_due_ratio_max_hipoteca,credit_reports__balance_due_ratio_max_pagos_fijos,credit_reports__balance_due_ratio_max_por_determinar,credit_reports__balance_due_ratio_max_revolvente,credit_reports__balance_due_ratio_max_sin_limite_preestablecido,credit_reports__balance_due_ratio_median_credito_de_habilitacion_o_avio,credit_reports__balance_due_ratio_median_hipoteca,credit_reports__balance_due_ratio_median_pagos_fijos,credit_reports__balance_due_ratio_median_por_determinar,credit_reports__balance_due_ratio_median_revolvente,credit_reports__balance_due_ratio_median_sin_limite_preestablecido,credit_reports__balance_due_ratio_std_credito_de_habilitacion_o_avio,credit_reports__balance_due_ratio_std_hipoteca,credit_reports__balance_due_ratio_std_pagos_fijos,credit_reports__balance_due_ratio_std_por_determinar,credit_reports__balance_due_ratio_std_revolvente,credit_reports__balance_due_ratio_std_sin_limite_preestablecido,credit_reports__balance_due_std_credito_de_habilitacion_o_avio,credit_reports__balance_due_std_hipoteca,credit_reports__balance_due_std_pagos_fijos,credit_reports__balance_due_std_por_determinar,credit_reports__balance_due_std_revolvente,credit_reports__balance_due_std_sin_limite_preestablecido,credit_reports__balance_due_sum_credito_de_habilitacion_o_avio,credit_reports__balance_due_sum_hipoteca,credit_reports__balance_due_sum_pagos_fijos,credit_reports__balance_due_sum_por_determinar,credit_reports__balance_due_sum_revolvente,credit_reports__balance_due_sum_sin_limite_preestablecido,credit_reports__balance_due_worst_delay_max_credito_de_habilitacion_o_avio,credit_reports__balance_due_worst_delay_max_hipoteca,credit_reports__balance_due_worst_delay_max_pagos_fijos,credit_reports__balance_due_worst_delay_max_por_determinar,credit_reports__balance_due_worst_delay_max_revolvente,credit_reports__balance_due_worst_delay_max_sin_limite_preestablecido,credit_reports__balance_due_worst_delay_median_credito_de_habilitacion_o_avio,credit_reports__balance_due_worst_delay_median_hipoteca,credit_reports__balance_due_worst_delay_median_pagos_fijos,credit_reports__balance_due_worst_delay_median_por_determinar,credit_reports__balance_due_worst_delay_median_revolvente,credit_reports__balance_due_worst_delay_median_sin_limite_preestablecido,credit_reports__balance_due_worst_delay_ratio_max_credito_de_habilitacion_o_avio,credit_reports__balance_due_worst_delay_ratio_max_hipoteca,credit_reports__balance_due_worst_delay_ratio_max_pagos_fijos,credit_reports__balance_due_worst_delay_ratio_max_por_determinar,credit_reports__balance_due_worst_delay_ratio_max_revolvente,credit_reports__balance_due_worst_delay_ratio_max_sin_limite_preestablecido,credit_reports__balance_due_worst_delay_ratio_median_credito_de_habilitacion_o_avio,credit_reports__balance_due_worst_delay_ratio_median_hipoteca,credit_reports__balance_due_worst_delay_ratio_median_pagos_fijos,credit_reports__balance_due_worst_delay_ratio_median_por_determinar,credit_reports__balance_due_worst_delay_ratio_median_revolvente,credit_reports__balance_due_worst_delay_ratio_median_sin_limite_preestablecido,credit_reports__balance_due_worst_delay_ratio_std_credito_de_habilitacion_o_avio,credit_reports__balance_due_worst_delay_ratio_std_hipoteca,credit_reports__balance_due_worst_delay_ratio_std_pagos_fijos,credit_reports__balance_due_worst_delay_ratio_std_por_determinar,credit_reports__balance_due_worst_delay_ratio_std_revolvente,credit_reports__balance_due_worst_delay_ratio_std_sin_limite_preestablecido,credit_reports__balance_due_worst_delay_std_credito_de_habilitacion_o_avio,credit_reports__balance_due_worst_delay_std_hipoteca,credit_reports__balance_due_worst_delay_std_pagos_fijos,credit_reports__balance_due_worst_delay_std_por_determinar,credit_reports__balance_due_worst_delay_std_revolvente,credit_reports__balance_due_worst_delay_std_sin_limite_preestablecido,credit_reports__business_type_nunique_credito_de_habilitacion_o_avio,credit_reports__business_type_nunique_hipoteca,credit_reports__business_type_nunique_pagos_fijos,credit_reports__business_type_nunique_por_determinar,credit_reports__business_type_nunique_revolvente,credit_reports__business_type_nunique_sin_limite_preestablecido,credit_reports__cdc_inquiry_id_count_credito_de_habilitacion_o_avio,credit_reports__cdc_inquiry_id_count_hipoteca,credit_reports__cdc_inquiry_id_count_pagos_fijos,credit_reports__cdc_inquiry_id_count_por_determinar,credit_reports__cdc_inquiry_id_count_revolvente,credit_reports__cdc_inquiry_id_count_sin_limite_preestablecido,credit_reports__credit_limit_median_credito_de_habilitacion_o_avio,credit_reports__credit_limit_median_hipoteca,credit_reports__credit_limit_median_pagos_fijos,credit_reports__credit_limit_median_por_determinar,credit_reports__credit_limit_median_revolvente,credit_reports__credit_limit_median_sin_limite_preestablecido,credit_reports__credit_limit_std_credito_de_habilitacion_o_avio,credit_reports__credit_limit_std_hipoteca,credit_reports__credit_limit_std_pagos_fijos,credit_reports__credit_limit_std_por_determinar,credit_reports__credit_limit_std_revolvente,credit_reports__credit_limit_std_sin_limite_preestablecido,credit_reports__credit_limit_sum_credito_de_habilitacion_o_avio,credit_reports__credit_limit_sum_hipoteca,credit_reports__credit_limit_sum_pagos_fijos,credit_reports__credit_limit_sum_por_determinar,credit_reports__credit_limit_sum_revolvente,credit_reports__credit_limit_sum_sin_limite_preestablecido,credit_reports__credit_type_nunique_credito_de_habilitacion_o_avio,credit_reports__credit_type_nunique_hipoteca,credit_reports__credit_type_nunique_pagos_fijos,credit_reports__credit_type_nunique_por_determinar,credit_reports__credit_type_nunique_revolvente,credit_reports__credit_type_nunique_sin_limite_preestablecido,credit_reports__current_balance_median_credito_de_habilitacion_o_avio,credit_reports__current_balance_median_hipoteca,credit_reports__current_balance_median_pagos_fijos,credit_reports__current_balance_median_por_determinar,credit_reports__current_balance_median_revolvente,credit_reports__current_balance_median_sin_limite_preestablecido,credit_reports__current_balance_std_credito_de_habilitacion_o_avio,credit_reports__current_balance_std_hipoteca,credit_reports__current_balance_std_pagos_fijos,credit_reports__current_balance_std_por_determinar,credit_reports__current_balance_std_revolvente,credit_reports__current_balance_std_sin_limite_preestablecido,credit_reports__current_balance_sum_credito_de_habilitacion_o_avio,credit_reports__current_balance_sum_hipoteca,credit_reports__current_balance_sum_pagos_fijos,credit_reports__current_balance_sum_por_determinar,credit_reports__current_balance_sum_revolvente,credit_reports__current_balance_sum_sin_limite_preestablecido,credit_reports__debt_ratio_max_credito_de_habilitacion_o_avio,credit_reports__debt_ratio_max_hipoteca,credit_reports__debt_ratio_max_pagos_fijos,credit_reports__debt_ratio_max_por_determinar,credit_reports__debt_ratio_max_revolvente,credit_reports__debt_ratio_max_sin_limite_preestablecido,credit_reports__debt_ratio_median_credito_de_habilitacion_o_avio,credit_reports__debt_ratio_median_hipoteca,credit_reports__debt_ratio_median_pagos_fijos,credit_reports__debt_ratio_median_por_determinar,credit_reports__debt_ratio_median_revolvente,credit_reports__debt_ratio_median_sin_limite_preestablecido,credit_reports__debt_ratio_std_credito_de_habilitacion_o_avio,credit_reports__debt_ratio_std_hipoteca,credit_reports__debt_ratio_std_pagos_fijos,credit_reports__debt_ratio_std_por_determinar,credit_reports__debt_ratio_std_revolvente,credit_reports__debt_ratio_std_sin_limite_preestablecido,credit_reports__has_delayed_payments_sum_credito_de_habilitacion_o_avio,credit_reports__has_delayed_payments_sum_hipoteca,credit_reports__has_delayed_payments_sum_pagos_fijos,credit_reports__has_delayed_payments_sum_por_determinar,credit_reports__has_delayed_payments_sum_revolvente,credit_reports__has_delayed_payments_sum_sin_limite_preestablecido,credit_reports__is_individual_responsibility_sum_credito_de_habilitacion_o_avio,credit_reports__is_individual_responsibility_sum_hipoteca,credit_reports__is_individual_responsibility_sum_pagos_fijos,credit_reports__is_individual_responsibility_sum_por_determinar,credit_reports__is_individual_responsibility_sum_revolvente,credit_reports__is_individual_responsibility_sum_sin_limite_preestablecido,credit_reports__max_credit_median_credito_de_habilitacion_o_avio,credit_reports__max_credit_median_hipoteca,credit_reports__max_credit_median_pagos_fijos,credit_reports__max_credit_median_por_determinar,credit_reports__max_credit_median_revolvente,credit_reports__max_credit_median_sin_limite_preestablecido,credit_reports__max_credit_std_credito_de_habilitacion_o_avio,credit_reports__max_credit_std_hipoteca,credit_reports__max_credit_std_pagos_fijos,credit_reports__max_credit_std_por_determinar,credit_reports__max_credit_std_revolvente,credit_reports__max_credit_std_sin_limite_preestablecido,credit_reports__max_credit_sum_credito_de_habilitacion_o_avio,credit_reports__max_credit_sum_hipoteca,credit_reports__max_credit_sum_pagos_fijos,credit_reports__max_credit_sum_por_determinar,credit_reports__max_credit_sum_revolvente,credit_reports__max_credit_sum_sin_limite_preestablecido,credit_reports__payment_amount_sum_credito_de_habilitacion_o_avio,credit_reports__payment_amount_sum_hipoteca,credit_reports__payment_amount_sum_pagos_fijos,credit_reports__payment_amount_sum_por_determinar,credit_reports__payment_amount_sum_revolvente,credit_reports__payment_amount_sum_sin_limite_preestablecido,credit_reports__severity_delayed_payments_max_credito_de_habilitacion_o_avio,credit_reports__severity_delayed_payments_max_hipoteca,credit_reports__severity_delayed_payments_max_pagos_fijos,credit_reports__severity_delayed_payments_max_por_determinar,credit_reports__severity_delayed_payments_max_revolvente,credit_reports__severity_delayed_payments_max_sin_limite_preestablecido,credit_reports__severity_delayed_payments_median_credito_de_habilitacion_o_avio,credit_reports__severity_delayed_payments_median_hipoteca,credit_reports__severity_delayed_payments_median_pagos_fijos,credit_reports__severity_delayed_payments_median_por_determinar,credit_reports__severity_delayed_payments_median_revolvente,credit_reports__severity_delayed_payments_median_sin_limite_preestablecido,credit_reports__severity_delayed_payments_std_credito_de_habilitacion_o_avio,credit_reports__severity_delayed_payments_std_hipoteca,credit_reports__severity_delayed_payments_std_pagos_fijos,credit_reports__severity_delayed_payments_std_por_determinar,credit_reports__severity_delayed_payments_std_revolvente,credit_reports__severity_delayed_payments_std_sin_limite_preestablecido,credit_reports__opening_loans_ratio,credit_reports__loans_with_at_least_one_delayed_ratio,credit_reports__debt_ratio,credit_reports__debt_due_ratio
0,4223,3,9312.0,19800.0,3909.0,1722.0,2966.0,1.0,1.0,0.716199,0.491557,3,2,997.0,157.0,0.583333,0.291667,0.291667,0.412479,1.0,1.0,0.666667,0.57735,1.0,1.0,0.666667,0.57735,2.0,3.0,3448.0,3.0,9312.0,19800.0,3909.0,1722.0,2966.0,1.0,1.0,0.716199,0.491557,3.0,2.0,997.0,157.0,0.583333,0.291667,0.291667,0.412479,1.0,1.0,0.666667,0.57735,1.0,1.0,0.666667,0.57735,2.0,3.0,3448.0,,,997.0,,762.0,,,,577.0,,762.0,,,,593.969696,,,,,,622.0,,1722.0,,,,1.0,,1.0,,,,0.5,,1.0,,,,0.707107,,,,,,879.640836,,,,,,1244.0,,1722.0,,,,1244.0,,1722.0,,,,622.0,,1722.0,,,,1.0,,1.0,,,,0.5,,1.0,,,,0.707107,,,,,,879.640836,,,,,,2.0,,1.0,,,,2.0,,1.0,,,,4950.0,,9900.0,,,,7000.357134,,,,,,9900.0,,9900.0,,,,2.0,,1.0,,,,1093.5,,1722.0,,,,212.839141,,,,,,2187.0,,1722.0,,,,1.0,,1.0,,,,0.574299,,1.0,,,,0.602032,,,,,,1.0,,1.0,,,,2.0,,1.0,,,,3795.0,,1722.0,,,,3607.658798,,,,,,7590.0,,1722.0,,,,1726.0,,1722.0,,,,0.583333,,,,,,0.291667,,,,,,0.412479,,,,1.0,0.666667,0.318514,1.0
1,3490,1,11600.0,0.0,6185.0,116.0,116.0,0.53319,0.53319,0.53319,,1,1,478.0,478.0,0.04,0.04,0.04,,0.01,0.01,0.01,,0.01,0.01,0.01,,1.0,1.0,232.0,1.0,11600.0,0.0,6185.0,116.0,116.0,0.53319,0.53319,0.53319,,1.0,1.0,478.0,478.0,0.04,0.04,0.04,,0.01,0.01,0.01,,0.01,0.01,0.01,,1.0,1.0,232.0,,,478.0,,,,,,478.0,,,,,,,,,,,,116.0,,,,,,0.01,,,,,,0.01,,,,,,,,,,,,,,,,,,116.0,,,,,,116.0,,,,,,116.0,,,,,,0.01,,,,,,0.01,,,,,,,,,,,,,,,,,,1.0,,,,,,1.0,,,,,,0.0,,,,,,,,,,,,0.0,,,,,,1.0,,,,,,6185.0,,,,,,,,,,,,6185.0,,,,,,0.53319,,,,,,0.53319,,,,,,,,,,,,1.0,,,,,,1.0,,,,,,11600.0,,,,,,,,,,,,11600.0,,,,,,232.0,,,,,,0.04,,,,,,0.04,,,,,,,,,,1.0,1.0,0.01,1.0
2,6486,2,2452.0,16800.0,2452.0,2452.0,2452.0,1.0,1.0,1.0,,2,2,1220.0,208.0,,,,,1.0,1.0,1.0,,1.0,1.0,1.0,,1.0,2.0,2452.0,2.0,2452.0,16800.0,2452.0,2452.0,2452.0,1.0,1.0,1.0,,2.0,2.0,1220.0,208.0,,,,,1.0,1.0,1.0,,1.0,1.0,1.0,,1.0,2.0,2452.0,,,,,1220.0,208.0,,,,,1220.0,208.0,,,,,,,,,,,2452.0,0.0,,,,,1.0,,,,,,1.0,,,,,,,,,,,,,,,,,,2452.0,0.0,,,,,2452.0,0.0,,,,,2452.0,0.0,,,,,1.0,,,,,,1.0,,,,,,,,,,,,,,,,,,1.0,1.0,,,,,1.0,1.0,,,,,16800.0,0.0,,,,,,,,,,,16800.0,0.0,,,,,1.0,1.0,,,,,2452.0,0.0,,,,,,,,,,,2452.0,0.0,,,,,1.0,,,,,,1.0,,,,,,,,,,,,1.0,0.0,,,,,1.0,1.0,,,,,2452.0,0.0,,,,,,,,,,,2452.0,0.0,,,,,2452.0,0.0,,,,,,,,,,,,,,,,,,,1.0,0.5,1.0,1.0
3,4075,52,317915.0,65044.0,35140.0,7880.0,35140.0,3.875391,0.0,0.120348,0.58368,8,6,2656.0,32.0,0.470588,0.0,0.069672,0.147488,3.875391,0.0,0.120348,0.58368,1.313333,0.012753,0.084605,0.235967,16.0,16.0,8180.0,4.0,17542.0,13000.0,35140.0,7880.0,35140.0,3.875391,1.021667,1.965686,1.653889,4.0,4.0,2087.0,501.0,0.470588,0.235294,0.235294,0.332756,3.875391,1.021667,1.965686,1.653889,1.313333,0.0,0.437778,0.758253,3.0,4.0,8180.0,,,1629.0,,1130.0,2656.0,,,314.0,,170.0,2371.5,,,305.670051,,309.81353,402.343758,,,0.0,,0.0,0.0,,,3.875391,,1.0,,,,0.0,,0.0,,,,0.663246,,0.27735,,,,4026.312665,,1461.11721,0.0,,,29673.0,,5467.0,0.0,,,7880.0,,1895.0,1240.0,,,38.0,,0.0,620.0,,,1.313333,,0.153664,,,,0.019087,,0.0,,,,0.271034,,0.043285,,,,1427.409169,,502.415506,876.812409,,,4.0,,4.0,2.0,,,36.0,,14.0,2.0,,,0.0,,2518.5,0.0,,,2119.536972,,2427.734222,0.0,,,18000.0,,47044.0,0.0,,,5.0,,2.0,2.0,,,0.0,,0.0,0.0,,,4026.312665,,1461.11721,0.0,,,29673.0,,5467.0,0.0,,,3.875391,,1.0,,,,0.0,,0.0,,,,0.663246,,0.27735,,,,12.0,,3.0,1.0,,,10.0,,4.0,2.0,,,4092.5,,2006.0,0.0,,,9141.093183,,2600.941678,0.0,,,275608.0,,42307.0,0.0,,,7633.0,,547.0,0.0,,,0.470588,,0.0,0.0,,,0.0,,0.0,0.0,,,0.158509,,0.0,,0.076923,0.307692,0.110533,1.0
4,437,48,775025.0,499063.0,13226.0,12666.0,20264.0,1.518111,0.0,0.070266,0.294789,6,6,1256.0,6.0,10.0,0.0,0.254076,1.4764,1.518111,0.0,0.079055,0.301211,167.0,0.0,3.534877,24.097096,7.0,47.0,9144.0,3.0,10501.0,5772.0,6647.0,0.0,6203.0,1.518111,0.076923,0.531678,0.855141,2.0,2.0,1256.0,486.0,0.0,0.0,0.0,,1.518111,0.0,0.506037,0.876482,0.0,0.0,0.0,0.0,2.0,3.0,6425.0,,,486.0,,,1256.0,,,112.0,,,1222.0,,,93.142032,,,48.083261,,,0.0,,,3101.5,,,1.352857,,,1.518111,,,0.0,,,0.759055,,,0.218422,,,1.073466,,,1368.685307,,,4386.183364,,,14061.0,,,6203.0,,,12666.0,,,0.0,,,0.0,,,0.0,,,167.0,,,0.0,,,0.0,,,0.0,,,24.614998,,,0.0,,,2062.728702,,,0.0,,,5.0,,,1.0,,,46.0,,,2.0,,,1656.5,,,,,,13641.20631,,,,,,499063.0,,,0.0,,,5.0,,,1.0,,,0.0,,,3101.5,,,839.856338,,,4386.183364,,,7023.0,,,6203.0,,,1.352857,,,1.518111,,,0.0,,,0.759055,,,0.207732,,,1.073466,,,5.0,,,2.0,,,45.0,,,2.0,,,15000.0,,,2364.5,,,11752.589951,,,2434.568648,,,770296.0,,,4729.0,,,2941.0,,,6203.0,,,10.0,,,,,,0.0,,,,,,1.4764,,,,0.0625,0.145833,0.026146,1.0
5,5316,26,63440.0,52908.0,29067.0,6681.0,8340.0,1.620066,0.0,0.349449,0.551514,6,7,1912.0,1.0,11.0,0.0,0.44,2.2,1.276154,0.0,0.086678,0.305114,1.276154,0.0,0.092704,0.304818,1.0,26.0,7503.0,15.0,30700.0,26532.0,27408.0,0.0,0.0,1.620066,0.0,0.497338,0.598458,4.0,4.0,179.0,37.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15.0,5995.0,,,912.0,,179.0,1912.0,,,70.0,,122.0,1912.0,,,262.284235,,80.610173,,,,0.0,,0.0,0.0,,,1.276154,,0.0,,,,0.0,,0.0,,,,0.317514,,0.0,,,,1420.049383,,0.0,,,,8340.0,,0.0,0.0,,,6681.0,,0.0,0.0,,,0.0,,0.0,0.0,,,1.276154,,0.0,,,,0.0,,0.0,,,,0.317036,,0.0,,,,1419.49532,,0.0,,,,6.0,,2.0,1.0,,,23.0,,2.0,1.0,,,500.0,,6587.0,,,,2671.9615,,7655.138013,,,,39734.0,,13174.0,0.0,,,4.0,,2.0,1.0,,,0.0,,6579.0,0.0,,,1635.993455,,7643.824305,,,,15909.0,,13158.0,0.0,,,1.620066,,1.0,,,,0.0,,0.973976,,,,0.541495,,0.036804,,,,1.0,,0.0,0.0,,,23.0,,2.0,1.0,,,600.0,,6908.0,0.0,,,3310.401223,,8109.100567,,,,49624.0,,13816.0,0.0,,,6032.0,,1471.0,0.0,,,11.0,,0.0,0.0,,,0.0,,0.0,0.0,,,2.293659,,,,0.576923,0.038462,0.131463,1.0
6,5032,18,32049.0,30967.0,15642.0,2783.0,3225.0,2.002933,0.0,0.415647,0.615063,5,7,3197.0,11.0,0.794872,0.0,0.106838,0.274357,0.232588,0.0,0.030558,0.080653,0.360684,0.0,0.05338,0.121654,5.0,17.0,16064.0,8.0,15651.0,24241.0,15642.0,2533.0,3225.0,2.002933,1.0,1.039118,0.530862,3.0,4.0,1790.0,64.0,0.0,0.0,0.0,0.0,0.232588,0.0,0.076396,0.118371,0.0,0.0,0.0,0.0,2.0,8.0,16064.0,,,510.0,,1790.0,3197.0,,,63.5,,1464.0,1652.5,,,150.223772,,461.033621,1077.066193,,,0.0,,0.0,546.0,,,0.0,,0.0,0.232588,,,0.0,,0.0,0.229187,,,0.0,,,0.00481,,,0.0,,0.0,1023.39252,,,0.0,,0.0,3225.0,,,2783.0,,929.0,2533.0,,,0.0,,464.5,0.0,,,0.360684,,0.0,0.0,,,0.0,,0.0,0.0,,,0.133659,,,0.0,,,798.532684,,656.9022,1266.5,,,4.0,,1.0,3.0,,,12.0,,2.0,4.0,,,448.0,,999.0,537.5,,,1085.767415,,0.0,9833.899172,,,7894.0,,1998.0,21075.0,,,3.0,,1.0,3.0,,,0.0,,341.5,2347.5,,,128.069909,,482.953932,4519.948119,,,817.0,,683.0,14142.0,,,1.056537,,2.002933,1.0,,,0.0,,2.002933,1.0,,,0.364426,,,0.0,,,3.0,,0.0,2.0,,,11.0,,2.0,4.0,,,500.0,,170.5,2347.5,,,2451.34367,,241.123412,4519.948119,,,17566.0,,341.0,14142.0,,,652.0,,1270.0,14142.0,,,0.794872,,0.0,0.0,,,0.0,,0.0,0.0,,,0.303165,,0.0,,0.444444,0.277778,0.100627,1.0
7,694,21,162589.0,92680.0,136844.0,19102.0,161939.0,4.65246,0.631952,0.707719,1.103913,6,10,7869.0,31.0,25.0,0.59375,4.553352,8.663281,4.65246,0.737964,0.871663,1.111005,1.953286,0.0,0.440995,0.592119,14.0,21.0,108098.0,11.0,102098.0,74580.0,131914.0,19102.0,126105.0,4.65246,0.888371,1.382962,1.286391,5.0,9.0,7869.0,372.0,25.0,0.520833,5.108593,9.202379,4.65246,0.739116,1.25899,1.372274,1.953286,0.737964,0.671833,0.634782,9.0,11.0,103168.0,,,2305.0,45.0,2176.0,7869.0,,,1733.0,31.0,456.0,1578.0,,,898.492225,8.082904,1018.157814,3133.162093,,,8783.0,0.0,9032.0,0.0,,,1.898264,0.0,1.953286,4.65246,,,0.73854,0.0,1.16212,1.0,,,0.554643,0.0,0.982496,2.449008,,,5837.111687,0.0,6953.018937,21037.961137,,,86272.0,0.0,22705.0,52962.0,,,17724.0,0.0,13673.0,19102.0,,,6115.0,0.0,9032.0,0.0,,,1.1703,0.0,1.953286,0.0,,,0.677046,0.0,1.16212,0.0,,,0.479542,0.0,0.982496,0.0,,,6837.45437,0.0,6953.018937,8542.674101,,,6.0,1.0,2.0,3.0,,,10.0,3.0,3.0,5.0,,,0.0,0.0,7000.0,0.0,,,10097.957769,0.0,2122.105872,0.0,,,73580.0,0.0,19100.0,0.0,,,4.0,1.0,1.0,2.0,,,7040.0,0.0,3358.0,0.0,,,6640.423839,0.0,7125.380902,21037.961137,,,66851.0,0.0,17031.0,52962.0,,,1.0,0.0,1.953286,4.65246,,,0.684958,0.0,0.843507,1.0,,,0.444978,0.0,0.979663,2.449008,,,8.0,0.0,2.0,4.0,,,10.0,3.0,3.0,5.0,,,10852.5,2529.0,7000.0,4930.0,,,5212.412323,0.0,2003.414835,4485.083411,,,114070.0,7587.0,18753.0,22179.0,,,40969.0,0.0,14167.0,52962.0,,,8.0,,,25.0,,,0.520833,,,25.0,,,2.815967,,,,0.52381,0.666667,0.996002,1.0
8,6194,61,216576.0,69904.0,11352.0,5700.0,11352.0,0.476815,0.0,0.007817,0.06105,2,5,2840.0,1.0,0.4375,0.0,0.036706,0.117095,0.476815,0.0,0.007817,0.06105,0.76,0.0,0.040649,0.159879,4.0,61.0,11352.0,1.0,23808.0,11352.0,11352.0,0.0,11352.0,0.476815,0.476815,0.476815,,1.0,1.0,2840.0,2840.0,,,,,0.476815,0.476815,0.476815,,0.0,0.0,0.0,,1.0,1.0,11352.0,,,836.0,210.0,2840.0,,,,131.5,31.0,2809.0,,,,254.004462,64.459583,777.304048,,,,0.0,0.0,0.0,,,,0.0,0.0,0.476815,,,,0.0,0.0,0.0,,,,0.0,0.0,0.238407,,,,0.0,0.0,5676.0,,,,0.0,0.0,11352.0,,,,5700.0,0.0,0.0,,,,0.0,0.0,0.0,,,,0.76,0.0,0.0,,,,0.0,0.0,0.0,,,,0.288699,0.0,0.0,,,,1751.53444,0.0,0.0,,,,4.0,1.0,2.0,,,,16.0,41.0,4.0,,,,0.0,0.0,13146.0,,,,1918.697006,0.0,3820.770079,,,,12260.0,0.0,57644.0,,,,1.0,1.0,2.0,,,,0.0,0.0,0.0,,,,0.0,0.0,5676.0,,,,0.0,0.0,11352.0,,,,0.0,0.0,0.476815,,,,0.0,0.0,0.0,,,,0.0,0.0,0.238407,,,,2.0,0.0,2.0,,,,16.0,41.0,4.0,,,,4700.0,1324.0,18477.0,,,,4410.249584,695.079057,6155.70857,,,,87393.0,55275.0,73908.0,,,,0.0,0.0,11352.0,,,,0.4375,0.0,,,,,0.0,0.0,,,,,0.137036,0.0,,,0.016393,0.065574,0.052416,1.0
9,1501,0,0.0,0.0,0.0,,0.0,,,,,0,0,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,,,,,0.0,0.0,,,,,,,,,,,,,,,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [8]:
df[df["customer_id"]==1501]

Unnamed: 0,customer_id,INQUIRY_TIME,CDC_INQUIRY_ID,INQUIRY_DATE,PREVENTION_KEY,CURRENCY,MAX_CREDIT,CREDIT_LIMIT,PAYMENT_AMOUNT,UPDATE_DATE,LOAN_OPENING_DATE,LOAN_CLOSING_DATE,WORST_DELAY_DATE,REPORT_DATE,LAST_PURCHASE_DATE,LAST_PAYMENT_DATE,PAYMENT_FREQUENCY,BUSINESS_TYPE,CREDIT_TYPE,ACCOUNT_TYPE,RESPONSABILITY_TYPE,TOTAL_PAYMENTS,DELAYED_PAYMENTS,CURRENT_PAYMENT,WORST_DELAY,TOTAL_REPORTED_PAYMENTS,CURRENT_BALANCE,BALANCE_DUE,BALANCE_DUE_WORST_DELAY,LOAN_ORIGINATION_DATETIME,APPLICATION_DATETIME
232,1501,2021-09-10 19:23:15.585000+00:00,,NaT,,,,,,NaT,NaT,NaT,NaT,NaT,NaT,NaT,,,,,,,,,,,,,,2022-07-08 17:50:39,2022-06-28 07:00:00


In [9]:
df_features.shape

(9249, 279)

### 2. Build features

In [10]:
formatted_date = datetime.now().strftime("%Y%m")

df_features.to_pickle(DATA_PATH / f"intermedian/{formatted_date}_credit_reports_features.pickle")