# 3.1.0. EDA: Identify Features with High Correlation

### Methodology

This step in exploratory data analysis (EDA) aims to identify and exclude features from the dataset that exhibit high correlation, specifically those with a correlation coefficient greater than 0.80. This process helps reduce multicollinearity, which can adversely affect the performance of many machine learning models by making the model parameters difficult to interpret and unstable.

Approach:
- Calculate the Pearson correlation matrix for all numeric features in the dataset. 
- Identify Highly Correlated Pairs: Using the function identify_high_correlation(data, threshold=0.9), identify all pairs of features whose absolute correlation exceeds the threshold of 0.80. This step effectively pinpoints potentially redundant features or have similar information content.
- Feature Selection: Review the list of highly correlated feature pairs. Decide which feature from each pair to retain based on:
    Importance in Modeling: Prefer features that are known to have a strong impact or relevance based on domain knowledge or previous modeling experiences.
    Data Quality: Retain features with fewer missing values or less noise.
    Business Relevance: Consider business understanding or input on which features are more interpretable or valuable for decision-making processes.

### Conclusion
- Initial Feature Count: Begin with an initial set of 262 features.
- Feature Exclusion: After applying the correlation threshold, identify 109 features that are highly correlated with others. From each correlated pair or group, choose to drop the feature that is less significant from a business or data quality perspective, retaining 153 features for further analysis.

In [1]:
import yaml
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pathlib import Path
pd.set_option("display.max_rows", None)
pd.set_option('display.max_colwidth', None)

In [2]:
def identify_high_correlation(data: pd.DataFrame, threshold: float = 0.8) -> pd.DataFrame:
    """
    Identifies pairs of columns in a DataFrame with correlation higher than a specified threshold.
    
    Parameters:
    - data (pd.DataFrame): The input DataFrame containing numeric columns.
    - threshold (float): The correlation coefficient above which a pair is considered highly correlated.

    Returns:
    - pd.DataFrame: A DataFrame listing pairs of highly correlated features, their correlation values, and their indices.
    """
    
    corr_matrix = data.corr()
    
    correlated_features = []
    
    for i in range(len(corr_matrix.columns)):
        for j in range(i+1, len(corr_matrix.columns)):
            if abs(corr_matrix.iloc[i, j]) > threshold:  # Only consider absolute coefficient value
                correlated_features.append((corr_matrix.columns[i], corr_matrix.columns[j], corr_matrix.iloc[i, j]))

    correlated_df = pd.DataFrame(correlated_features, columns=['Feature 1', 'Feature 2', 'Correlation'])
    
    return correlated_df

### 1. Load Data 

In [3]:
with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)
    
numeric_features = config["raw_features"]["numerical"]
features = numeric_features
target = config["main"]["target"]
data_train_path = Path.cwd().parent / config["main"]["data_train_path"]

In [4]:
features = features + [
    'credit_reports__age_max_credito_de_habilitacion_o_avio',
    'credit_reports__age_max_hipoteca',
    'credit_reports__age_max_pagos_fijos',
    'credit_reports__age_max_por_determinar',
    'credit_reports__age_max_revolvente',
    'credit_reports__age_max_sin_limite_preestablecido',
    'credit_reports__age_median_credito_de_habilitacion_o_avio',
    'credit_reports__age_median_hipoteca',
    'credit_reports__age_median_pagos_fijos',
    'credit_reports__age_median_por_determinar',
    'credit_reports__age_median_revolvente',
    'credit_reports__age_median_sin_limite_preestablecido',
    'credit_reports__age_std_credito_de_habilitacion_o_avio',
    'credit_reports__age_std_hipoteca',
    'credit_reports__age_std_pagos_fijos',
    'credit_reports__age_std_por_determinar',
    'credit_reports__age_std_revolvente',
    'credit_reports__age_std_sin_limite_preestablecido',
    'credit_reports__balance_due_median_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_median_hipoteca',
    'credit_reports__balance_due_median_pagos_fijos',
    'credit_reports__balance_due_median_por_determinar',
    'credit_reports__balance_due_median_revolvente',
    'credit_reports__balance_due_median_sin_limite_preestablecido',
    'credit_reports__balance_due_ratio_max_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_ratio_max_hipoteca',
    'credit_reports__balance_due_ratio_max_pagos_fijos',
    'credit_reports__balance_due_ratio_max_por_determinar',
    'credit_reports__balance_due_ratio_max_revolvente',
    'credit_reports__balance_due_ratio_max_sin_limite_preestablecido',
    'credit_reports__balance_due_ratio_median_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_ratio_median_hipoteca',
    'credit_reports__balance_due_ratio_median_pagos_fijos',
    'credit_reports__balance_due_ratio_median_por_determinar',
    'credit_reports__balance_due_ratio_median_revolvente',
    'credit_reports__balance_due_ratio_median_sin_limite_preestablecido',
    'credit_reports__balance_due_ratio_std_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_ratio_std_hipoteca',
    'credit_reports__balance_due_ratio_std_pagos_fijos',
    'credit_reports__balance_due_ratio_std_por_determinar',
    'credit_reports__balance_due_ratio_std_revolvente',
    'credit_reports__balance_due_ratio_std_sin_limite_preestablecido',
    'credit_reports__balance_due_std_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_std_hipoteca',
    'credit_reports__balance_due_std_pagos_fijos',
    'credit_reports__balance_due_std_por_determinar',
    'credit_reports__balance_due_std_revolvente',
    'credit_reports__balance_due_std_sin_limite_preestablecido',
    'credit_reports__balance_due_sum_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_sum_hipoteca',
    'credit_reports__balance_due_sum_pagos_fijos',
    'credit_reports__balance_due_sum_por_determinar',
    'credit_reports__balance_due_sum_revolvente',
    'credit_reports__balance_due_sum_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_max_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_max_hipoteca',
    'credit_reports__balance_due_worst_delay_max_pagos_fijos',
    'credit_reports__balance_due_worst_delay_max_por_determinar',
    'credit_reports__balance_due_worst_delay_max_revolvente',
    'credit_reports__balance_due_worst_delay_max_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_median_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_median_hipoteca',
    'credit_reports__balance_due_worst_delay_median_pagos_fijos',
    'credit_reports__balance_due_worst_delay_median_por_determinar',
    'credit_reports__balance_due_worst_delay_median_revolvente',
    'credit_reports__balance_due_worst_delay_median_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_ratio_max_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_ratio_max_hipoteca',
    'credit_reports__balance_due_worst_delay_ratio_max_pagos_fijos',
    'credit_reports__balance_due_worst_delay_ratio_max_por_determinar',
    'credit_reports__balance_due_worst_delay_ratio_max_revolvente',
    'credit_reports__balance_due_worst_delay_ratio_max_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_ratio_median_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_ratio_median_hipoteca',
    'credit_reports__balance_due_worst_delay_ratio_median_pagos_fijos',
    'credit_reports__balance_due_worst_delay_ratio_median_por_determinar',
    'credit_reports__balance_due_worst_delay_ratio_median_revolvente',
    'credit_reports__balance_due_worst_delay_ratio_median_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_ratio_std_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_ratio_std_hipoteca',
    'credit_reports__balance_due_worst_delay_ratio_std_pagos_fijos',
    'credit_reports__balance_due_worst_delay_ratio_std_por_determinar',
    'credit_reports__balance_due_worst_delay_ratio_std_revolvente',
    'credit_reports__balance_due_worst_delay_ratio_std_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_std_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_std_hipoteca',
    'credit_reports__balance_due_worst_delay_std_pagos_fijos',
    'credit_reports__balance_due_worst_delay_std_por_determinar',
    'credit_reports__balance_due_worst_delay_std_revolvente',
    'credit_reports__balance_due_worst_delay_std_sin_limite_preestablecido',
    'credit_reports__business_type_nunique_credito_de_habilitacion_o_avio',
    'credit_reports__business_type_nunique_hipoteca',
    'credit_reports__business_type_nunique_pagos_fijos',
    'credit_reports__business_type_nunique_por_determinar',
    'credit_reports__business_type_nunique_revolvente',
    'credit_reports__business_type_nunique_sin_limite_preestablecido',
    'credit_reports__cdc_inquiry_id_count_credito_de_habilitacion_o_avio',
    'credit_reports__cdc_inquiry_id_count_hipoteca',
    'credit_reports__cdc_inquiry_id_count_pagos_fijos',
    'credit_reports__cdc_inquiry_id_count_por_determinar',
    'credit_reports__cdc_inquiry_id_count_revolvente',
    'credit_reports__cdc_inquiry_id_count_sin_limite_preestablecido',
    'credit_reports__credit_limit_median_credito_de_habilitacion_o_avio',
    'credit_reports__credit_limit_median_hipoteca',
    'credit_reports__credit_limit_median_pagos_fijos',
    'credit_reports__credit_limit_median_por_determinar',
    'credit_reports__credit_limit_median_revolvente',
    'credit_reports__credit_limit_median_sin_limite_preestablecido',
    'credit_reports__credit_limit_std_credito_de_habilitacion_o_avio',
    'credit_reports__credit_limit_std_hipoteca',
    'credit_reports__credit_limit_std_pagos_fijos',
    'credit_reports__credit_limit_std_por_determinar',
    'credit_reports__credit_limit_std_revolvente',
    'credit_reports__credit_limit_std_sin_limite_preestablecido',
    'credit_reports__credit_limit_sum_credito_de_habilitacion_o_avio',
    'credit_reports__credit_limit_sum_hipoteca',
    'credit_reports__credit_limit_sum_pagos_fijos',
    'credit_reports__credit_limit_sum_por_determinar',
    'credit_reports__credit_limit_sum_revolvente',
    'credit_reports__credit_limit_sum_sin_limite_preestablecido',
    'credit_reports__credit_type_nunique_credito_de_habilitacion_o_avio',
    'credit_reports__credit_type_nunique_hipoteca',
    'credit_reports__credit_type_nunique_pagos_fijos',
    'credit_reports__credit_type_nunique_por_determinar',
    'credit_reports__credit_type_nunique_revolvente',
    'credit_reports__credit_type_nunique_sin_limite_preestablecido',
    'credit_reports__current_balance_median_credito_de_habilitacion_o_avio',
    'credit_reports__current_balance_median_hipoteca',
    'credit_reports__current_balance_median_pagos_fijos',
    'credit_reports__current_balance_median_por_determinar',
    'credit_reports__current_balance_median_revolvente',
    'credit_reports__current_balance_median_sin_limite_preestablecido',
    'credit_reports__current_balance_std_credito_de_habilitacion_o_avio',
    'credit_reports__current_balance_std_hipoteca',
    'credit_reports__current_balance_std_pagos_fijos',
    'credit_reports__current_balance_std_por_determinar',
    'credit_reports__current_balance_std_revolvente',
    'credit_reports__current_balance_std_sin_limite_preestablecido',
    'credit_reports__current_balance_sum_credito_de_habilitacion_o_avio',
    'credit_reports__current_balance_sum_hipoteca',
    'credit_reports__current_balance_sum_pagos_fijos',
    'credit_reports__current_balance_sum_por_determinar',
    'credit_reports__current_balance_sum_revolvente',
    'credit_reports__current_balance_sum_sin_limite_preestablecido',
    'credit_reports__debt_ratio_max_credito_de_habilitacion_o_avio',
    'credit_reports__debt_ratio_max_hipoteca',
    'credit_reports__debt_ratio_max_pagos_fijos',
    'credit_reports__debt_ratio_max_por_determinar',
    'credit_reports__debt_ratio_max_revolvente',
    'credit_reports__debt_ratio_max_sin_limite_preestablecido',
    'credit_reports__debt_ratio_median_credito_de_habilitacion_o_avio',
    'credit_reports__debt_ratio_median_hipoteca',
    'credit_reports__debt_ratio_median_pagos_fijos',
    'credit_reports__debt_ratio_median_por_determinar',
    'credit_reports__debt_ratio_median_revolvente',
    'credit_reports__debt_ratio_median_sin_limite_preestablecido',
    'credit_reports__debt_ratio_std_credito_de_habilitacion_o_avio',
    'credit_reports__debt_ratio_std_hipoteca',
    'credit_reports__debt_ratio_std_pagos_fijos',
    'credit_reports__debt_ratio_std_por_determinar',
    'credit_reports__debt_ratio_std_revolvente',
    'credit_reports__debt_ratio_std_sin_limite_preestablecido',
    'credit_reports__has_delayed_payments_sum_credito_de_habilitacion_o_avio',
    'credit_reports__has_delayed_payments_sum_hipoteca',
    'credit_reports__has_delayed_payments_sum_pagos_fijos',
    'credit_reports__has_delayed_payments_sum_por_determinar',
    'credit_reports__has_delayed_payments_sum_revolvente',
    'credit_reports__has_delayed_payments_sum_sin_limite_preestablecido',
    'credit_reports__is_individual_responsibility_sum_credito_de_habilitacion_o_avio',
    'credit_reports__is_individual_responsibility_sum_hipoteca',
    'credit_reports__is_individual_responsibility_sum_pagos_fijos',
    'credit_reports__is_individual_responsibility_sum_por_determinar',
    'credit_reports__is_individual_responsibility_sum_revolvente',
    'credit_reports__is_individual_responsibility_sum_sin_limite_preestablecido',
    'credit_reports__max_credit_median_credito_de_habilitacion_o_avio',
    'credit_reports__max_credit_median_hipoteca',
    'credit_reports__max_credit_median_pagos_fijos',
    'credit_reports__max_credit_median_por_determinar',
    'credit_reports__max_credit_median_revolvente',
    'credit_reports__max_credit_median_sin_limite_preestablecido',
    'credit_reports__max_credit_std_credito_de_habilitacion_o_avio',
    'credit_reports__max_credit_std_hipoteca',
    'credit_reports__max_credit_std_pagos_fijos',
    'credit_reports__max_credit_std_por_determinar',
    'credit_reports__max_credit_std_revolvente',
    'credit_reports__max_credit_std_sin_limite_preestablecido',
    'credit_reports__max_credit_sum_credito_de_habilitacion_o_avio',
    'credit_reports__max_credit_sum_hipoteca',
    'credit_reports__max_credit_sum_pagos_fijos',
    'credit_reports__max_credit_sum_por_determinar',
    'credit_reports__max_credit_sum_revolvente',
    'credit_reports__max_credit_sum_sin_limite_preestablecido',
    'credit_reports__payment_amount_sum_credito_de_habilitacion_o_avio',
    'credit_reports__payment_amount_sum_hipoteca',
    'credit_reports__payment_amount_sum_pagos_fijos',
    'credit_reports__payment_amount_sum_por_determinar',
    'credit_reports__payment_amount_sum_revolvente',
    'credit_reports__payment_amount_sum_sin_limite_preestablecido',
    'credit_reports__severity_delayed_payments_max_credito_de_habilitacion_o_avio',
    'credit_reports__severity_delayed_payments_max_hipoteca',
    'credit_reports__severity_delayed_payments_max_pagos_fijos',
    'credit_reports__severity_delayed_payments_max_por_determinar',
    'credit_reports__severity_delayed_payments_max_revolvente',
    'credit_reports__severity_delayed_payments_max_sin_limite_preestablecido',
    'credit_reports__severity_delayed_payments_median_credito_de_habilitacion_o_avio',
    'credit_reports__severity_delayed_payments_median_hipoteca',
    'credit_reports__severity_delayed_payments_median_pagos_fijos',
    'credit_reports__severity_delayed_payments_median_por_determinar',
    'credit_reports__severity_delayed_payments_median_revolvente',
    'credit_reports__severity_delayed_payments_median_sin_limite_preestablecido',
    'credit_reports__severity_delayed_payments_std_credito_de_habilitacion_o_avio',
    'credit_reports__severity_delayed_payments_std_hipoteca',
    'credit_reports__severity_delayed_payments_std_pagos_fijos',
    'credit_reports__severity_delayed_payments_std_por_determinar',
    'credit_reports__severity_delayed_payments_std_revolvente',
    'credit_reports__severity_delayed_payments_std_sin_limite_preestablecido',
]
train_df = pd.read_pickle(data_train_path)[features]
train_df.shape

(9479, 259)

In [5]:
high_corr_features = identify_high_correlation(train_df[features], threshold=0.90)
high_corr_features.sort_values("Correlation", ascending=False)


Unnamed: 0,Feature 1,Feature 2,Correlation
228,credit_reports__balance_due_worst_delay_ratio_max_credito_de_habilitacion_o_avio,credit_reports__severity_delayed_payments_median_credito_de_habilitacion_o_avio,1.0
246,credit_reports__balance_due_worst_delay_ratio_median_credito_de_habilitacion_o_avio,credit_reports__severity_delayed_payments_median_credito_de_habilitacion_o_avio,1.0
245,credit_reports__balance_due_worst_delay_ratio_median_credito_de_habilitacion_o_avio,credit_reports__severity_delayed_payments_max_credito_de_habilitacion_o_avio,1.0
227,credit_reports__balance_due_worst_delay_ratio_max_credito_de_habilitacion_o_avio,credit_reports__severity_delayed_payments_max_credito_de_habilitacion_o_avio,1.0
283,credit_reports__credit_limit_sum_credito_de_habilitacion_o_avio,credit_reports__debt_ratio_median_credito_de_habilitacion_o_avio,1.0
103,credit_reports__balance_due_ratio_max_credito_de_habilitacion_o_avio,credit_reports__credit_limit_median_credito_de_habilitacion_o_avio,1.0
270,credit_reports__credit_limit_median_credito_de_habilitacion_o_avio,credit_reports__debt_ratio_max_credito_de_habilitacion_o_avio,1.0
271,credit_reports__credit_limit_median_credito_de_habilitacion_o_avio,credit_reports__debt_ratio_median_credito_de_habilitacion_o_avio,1.0
282,credit_reports__credit_limit_sum_credito_de_habilitacion_o_avio,credit_reports__debt_ratio_max_credito_de_habilitacion_o_avio,1.0
104,credit_reports__balance_due_ratio_max_credito_de_habilitacion_o_avio,credit_reports__credit_limit_sum_credito_de_habilitacion_o_avio,1.0


In [6]:
exclude_features = [
    "credit_reports__balance_due_ratio_max",
    "credit_reports__open_loans_current_balance_sum",
    "credit_reports__is_individual_responsibility_sum",
    "credit_reports__balance_due_ratio_std",
    "credit_reports__balance_due_worst_delay_ratio_std",
    "credit_reports__open_loans_payment_amount_sum",
    "credit_reports__open_loans_balance_due_sum",
    "credit_reports__balance_due_worst_delay_ratio_max",
    "credit_reports__balance_due_worst_delay_ratio_std",
    "credit_reports__balance_due_worst_delay_ratio_max",
    "credit_reports__business_type_nunique",

    'credit_reports__age_median_credito_de_habilitacion_o_avio',
    'credit_reports__age_median_hipoteca',
    'credit_reports__age_median_por_determinar',
    'credit_reports__balance_due_ratio_max',
    'credit_reports__balance_due_ratio_max_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_ratio_max_por_determinar',
    'credit_reports__balance_due_ratio_max_revolvente',
    'credit_reports__balance_due_ratio_median_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_ratio_median_hipoteca',
    'credit_reports__balance_due_ratio_median_por_determinar',
    'credit_reports__balance_due_ratio_std_pagos_fijos',
    'credit_reports__balance_due_ratio_std_revolvente',
    'credit_reports__balance_due_sum_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_sum_hipoteca',
    'credit_reports__balance_due_sum_por_determinar',
    'credit_reports__balance_due_worst_delay_max_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_max_por_determinar',
    'credit_reports__balance_due_worst_delay_max_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_median_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_median_hipoteca',
    'credit_reports__balance_due_worst_delay_median_por_determinar',
    'credit_reports__balance_due_worst_delay_median_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_ratio_max_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_ratio_max_hipoteca',
    'credit_reports__balance_due_worst_delay_ratio_max_por_determinar',
    'credit_reports__balance_due_worst_delay_ratio_max_revolvente',
    'credit_reports__balance_due_worst_delay_ratio_median_credito_de_habilitacion_o_avio',
    'credit_reports__balance_due_worst_delay_ratio_median_hipoteca',
    'credit_reports__balance_due_worst_delay_ratio_median_por_determinar',
    'credit_reports__balance_due_worst_delay_ratio_median_revolvente',
    'credit_reports__balance_due_worst_delay_ratio_std_pagos_fijos',
    'credit_reports__balance_due_worst_delay_ratio_std_por_determinar',
    'credit_reports__balance_due_worst_delay_ratio_std_revolvente',
    'credit_reports__balance_due_worst_delay_ratio_std_sin_limite_preestablecido',
    'credit_reports__balance_due_worst_delay_std_hipoteca',
    'credit_reports__balance_due_worst_delay_std_por_determinar',
    'credit_reports__balance_due_worst_delay_std_revolvente',
    'credit_reports__balance_due_worst_delay_std_sin_limite_preestablecido',
    'credit_reports__business_type_nunique_pagos_fijos',
    'credit_reports__cdc_inquiry_id_count_credito_de_habilitacion_o_avio',
    'credit_reports__cdc_inquiry_id_count_hipoteca',
    'credit_reports__cdc_inquiry_id_count_pagos_fijos',
    'credit_reports__credit_limit_median_credito_de_habilitacion_o_avio',
    'credit_reports__credit_limit_sum_credito_de_habilitacion_o_avio',
    'credit_reports__credit_limit_sum_hipoteca',
    'credit_reports__credit_limit_sum_sin_limite_preestablecido',
    'credit_reports__credit_type_nunique_credito_de_habilitacion_o_avio',
    'credit_reports__credit_type_nunique_hipoteca',
    'credit_reports__credit_type_nunique_pagos_fijos',
    'credit_reports__credit_type_nunique_por_determinar',
    'credit_reports__credit_type_nunique_sin_limite_preestablecido',
    'credit_reports__current_balance_median_credito_de_habilitacion_o_avio',
    'credit_reports__current_balance_median_por_determinar',
    'credit_reports__current_balance_std_por_determinar',
    'credit_reports__current_balance_std_sin_limite_preestablecido',
    'credit_reports__current_balance_sum_credito_de_habilitacion_o_avio',
    'credit_reports__current_balance_sum_hipoteca',
    'credit_reports__current_balance_sum_por_determinar',
    'credit_reports__current_balance_sum_sin_limite_preestablecido',
    'credit_reports__debt_ratio_max_credito_de_habilitacion_o_avio',
    'credit_reports__debt_ratio_max_revolvente',
    'credit_reports__debt_ratio_max_sin_limite_preestablecido',
    'credit_reports__debt_ratio_median_credito_de_habilitacion_o_avio',
    'credit_reports__debt_ratio_median_hipoteca',
    'credit_reports__debt_ratio_median_revolvente',
    'credit_reports__debt_ratio_median_sin_limite_preestablecido',
    'credit_reports__debt_ratio_std_pagos_fijos',
    'credit_reports__debt_ratio_std_revolvente',
    'credit_reports__debt_ratio_std_sin_limite_preestablecido',
    'credit_reports__has_delayed_payments_sum_credito_de_habilitacion_o_avio',
    'credit_reports__has_delayed_payments_sum_pagos_fijos',
    'credit_reports__is_individual_responsibility_sum_hipoteca',
    'credit_reports__is_individual_responsibility_sum_pagos_fijos',
    'credit_reports__is_individual_responsibility_sum_por_determinar',
    'credit_reports__is_individual_responsibility_sum_revolvente',
    'credit_reports__is_individual_responsibility_sum_sin_limite_preestablecido',
    'credit_reports__max_credit_median_credito_de_habilitacion_o_avio',
    'credit_reports__max_credit_median_hipoteca',
    'credit_reports__max_credit_median_por_determinar',
    'credit_reports__max_credit_std_credito_de_habilitacion_o_avio',
    'credit_reports__max_credit_std_pagos_fijos',
    'credit_reports__max_credit_std_sin_limite_preestablecido',
    'credit_reports__max_credit_sum_credito_de_habilitacion_o_avio',
    'credit_reports__max_credit_sum_hipoteca',
    'credit_reports__max_credit_sum_pagos_fijos',
    'credit_reports__max_credit_sum_revolvente',
    'credit_reports__max_credit_sum_sin_limite_preestablecido',
    'credit_reports__payment_amount_sum_credito_de_habilitacion_o_avio',
    'credit_reports__payment_amount_sum_por_determinar',
    'credit_reports__payment_amount_sum_sin_limite_preestablecido',
    'credit_reports__severity_delayed_payments_max_credito_de_habilitacion_o_avio',
    'credit_reports__severity_delayed_payments_max_pagos_fijos',
    'credit_reports__severity_delayed_payments_max_por_determinar',
    'credit_reports__severity_delayed_payments_median_credito_de_habilitacion_o_avio',
    'credit_reports__severity_delayed_payments_median_hipoteca',
    'credit_reports__severity_delayed_payments_median_por_determinar',
    'credit_reports__severity_delayed_payments_median_sin_limite_preestablecido',
    'credit_reports__severity_delayed_payments_std_pagos_fijos',
]
len(exclude_features)

109

In [7]:
result_list = set(features) - set(exclude_features)
result_list = list(result_list)
result_list.sort()
result_list

['credit_reports__age_max',
 'credit_reports__age_max_credito_de_habilitacion_o_avio',
 'credit_reports__age_max_hipoteca',
 'credit_reports__age_max_pagos_fijos',
 'credit_reports__age_max_por_determinar',
 'credit_reports__age_max_revolvente',
 'credit_reports__age_max_sin_limite_preestablecido',
 'credit_reports__age_median_pagos_fijos',
 'credit_reports__age_median_revolvente',
 'credit_reports__age_median_sin_limite_preestablecido',
 'credit_reports__age_min',
 'credit_reports__age_std_credito_de_habilitacion_o_avio',
 'credit_reports__age_std_hipoteca',
 'credit_reports__age_std_pagos_fijos',
 'credit_reports__age_std_por_determinar',
 'credit_reports__age_std_revolvente',
 'credit_reports__age_std_sin_limite_preestablecido',
 'credit_reports__balance_due_median_credito_de_habilitacion_o_avio',
 'credit_reports__balance_due_median_hipoteca',
 'credit_reports__balance_due_median_pagos_fijos',
 'credit_reports__balance_due_median_por_determinar',
 'credit_reports__balance_due_media