# Dataset Home Credit Default Risk

Datos de clientes de la entidad Home Credit organizados en siete archivos diferentes según la información proporcionada: datos sobre la solicitud de crédito e historia crediticia (interna o de organizaciones externas reportadas en bureaus.)

Home Credit es una compañía financiera internacional que opera en múltiples países europeos y asiáticos que opera en múltiples países europeos y asiáticos.

*Objetivo del notebook*: Realizar una exploración de los datos (EDA)

## Exploración de los datos

### Carga de librerías

In [1]:
import pandas as pd
import dtale as dt
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os
warnings.filterwarnings('ignore')

### Carga de ficheros

In [2]:
path = "../../data/raw/"
csv_files = {
    "application_train": "DS03_application_train.csv",
    "application_test": "DS03_application_test.csv",
    "bureau": "DS03_bureau.csv",
    "bureau_balance": "DS03_bureau_balance.csv",
    "previous_application": "DS03_previous_application.csv",
    "POS_CASH_balance": "DS03_POS_CASH_balance.csv",
    "installments_payments": "DS03_installments_payments.csv",
    "credit_card_balance": "DS03_credit_card_balance.csv"
}

# Load all files into a dictionary
dict_data = {}
for key, file in csv_files.items():
    dict_data[key] = pd.read_csv(os.path.join(path,file))

### Descripción datasets

In [3]:
# Function to print the shape and unique data types per dataset
def datasets_description(df, dataset_name):
    print(f"\nDataset description: {dataset_name}")
    
    # Print dataset shape
    print(f"\nShape: {df[dataset_name].shape}")
    # Print unique data types
    print(f"Unique data types:")
    print(df[dataset_name].dtypes.value_counts())

In [4]:
for key in dict_data.keys():
    datasets_description(dict_data, key)


Dataset description: application_train

Shape: (307511, 122)
Unique data types:
float64    65
int64      41
object     16
Name: count, dtype: int64

Dataset description: application_test

Shape: (48744, 121)
Unique data types:
float64    65
int64      40
object     16
Name: count, dtype: int64

Dataset description: bureau

Shape: (1716428, 17)
Unique data types:
float64    8
int64      6
object     3
Name: count, dtype: int64

Dataset description: bureau_balance

Shape: (27299925, 3)
Unique data types:
int64     2
object    1
Name: count, dtype: int64

Dataset description: previous_application

Shape: (1670214, 37)
Unique data types:
object     16
float64    15
int64       6
Name: count, dtype: int64

Dataset description: POS_CASH_balance

Shape: (10001358, 8)
Unique data types:
int64      5
float64    2
object     1
Name: count, dtype: int64

Dataset description: installments_payments

Shape: (13605401, 8)
Unique data types:
float64    5
int64      3
Name: count, dtype: int64

Datase

### Verificación de calidad

Análisis: 
1. Dimension de completitud: 
    - 1.1. Evaluación de valores nulos columnas
    - 1.2. Evaluación de valores nulos filas
    - 1.3. Evaluación de valores nulos dataset
2. Dimensión exactitud
3. Dimensión consistencia
---


2. Evaluación de formato válido
3. Valores ajustados en rangos
4. Claves únicas
5. Integridad referencial
6. Cumplimiento de reglas en valores

#### 1. Dimensión de completitud: Evaluación de valores nulos

##### 1.1. A nivel atributos: comprobamos el % de missing values por cada atributo

In [57]:
# Function to check the percentage of missing values
def completeness_attributes(df):
    percentage_nulls = df.isnull().sum() / len(df) * 100
    cols_nulls = percentage_nulls[percentage_nulls > 0].sort_values(ascending=False)
    return cols_nulls

'\n\ndef completeness_rows(df, dataset_name, acceptance_boundary):\n    total_cols = df[dataset_name].shape[1]\n    nulls_x_row = df[dataset_name].isnull().sum(axis=1)\n    perc_x_row = nulls_x_row / total_cols\n    rows_above_boundary = (perc_x_row > acceptance_boundary).sum()\n    perc_rows_above_boundary = round((rows_above_boundary / df[dataset_name].shape[0]),2) * 100\n    return perc_rows_above_boundary\n\n    acceptance_boundary = 0.2 # Las filas que superen este umbral seran tomadas como invalidas en este caso las que sean mayores a un 20%\n\nfor key in dict_data.keys():\n    print(f"\nDataset description: {key}")\n    perc_rows_above_boundary = completitud_f(dict_data, key, acceptance_boundary)\n    print(f"Completitud a nivel de filas: {perc_rows_above_boundary}%")\n    #porcentaje_completitud_d = completitud_d(dict_data, key)\n    #print(f"Completitud a nivel de dataset: {porcentaje_completitud_d}%")\n\n\ndef completitud_d(df, dataset_name):\n    filas_con_vacios = df[datase

In [58]:
# Check missing percentage for each dataset
for key in dict_data.keys():
    print(f"\n{key} - Percentatges of null values for each attribute:")
    missing_percentage = completeness_attributes(dict_data[key])
    print(missing_percentage)


application_train - Percentatges of null values for each attribute:
COMMONAREA_MEDI             69.872297
COMMONAREA_AVG              69.872297
COMMONAREA_MODE             69.872297
NONLIVINGAPARTMENTS_MEDI    69.432963
NONLIVINGAPARTMENTS_MODE    69.432963
                              ...    
EXT_SOURCE_2                 0.214626
AMT_GOODS_PRICE              0.090403
AMT_ANNUITY                  0.003902
CNT_FAM_MEMBERS              0.000650
DAYS_LAST_PHONE_CHANGE       0.000325
Length: 67, dtype: float64

application_test - Percentatges of null values for each attribute:
COMMONAREA_MODE             68.716150
COMMONAREA_MEDI             68.716150
COMMONAREA_AVG              68.716150
NONLIVINGAPARTMENTS_MEDI    68.412523
NONLIVINGAPARTMENTS_AVG     68.412523
                              ...    
OBS_60_CNT_SOCIAL_CIRCLE     0.059495
DEF_30_CNT_SOCIAL_CIRCLE     0.059495
OBS_30_CNT_SOCIAL_CIRCLE     0.059495
AMT_ANNUITY                  0.049237
EXT_SOURCE_2                 0.016412


Se observan varias columnas en los distintos datasets con un porcentaje de nulos mayor al 70%. Por ejemplo, el atributo RATE_INTEREST_PRIMARY del dataset previous_application.

##### 1.2. A nivel atributos: comprobamos el % de atributos nulos por cada fila

In [69]:
def completeness_rows(df,acceptance_boundary):

    cantidad_columnas = len(df.axes[1])
    df['completitud_fila'] = (df.isnull().sum(axis=1) / cantidad_columnas)
    problemas = df[df['completitud_fila'] >= acceptance_boundary]
    completitud_f = problemas.shape[0]
    return round((completitud_f  / df.shape[0]) * 100, 2)

acceptance_boundary = 0.2 # Las filas que superen este umbral seran tomadas como invalidas en este caso las que sean mayores a un 20%

for key in dict_data.keys():
    print(f"\nDataset description: {key}")
    perc_rows_above_boundary = completeness_rows(dict_data[key], acceptance_boundary)
    print(f"Filas que incumplen el umbral de nulos en columnas [completitud_f]: {perc_rows_above_boundary}%")


Dataset description: application_train
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 55.82%

Dataset description: application_test
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 53.86%

Dataset description: bureau
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 4.41%

Dataset description: bureau_balance
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 0.0%

Dataset description: previous_application
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 34.79%

Dataset description: POS_CASH_balance
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 0.26%

Dataset description: installments_payments
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 0.02%

Dataset description: credit_card_balance
Filas que incumplen el umbral de nulos en columnas [completitud_f]: 19.52%


Se observan porcentajes elevados de filas con un porcentaje mayor al 20% de atributos nulos por fila

##### 1.3. A nivel atributos: comprobamos el % de filas que presentan nulos en el dataset

In [73]:
def completeness_df(df):
    completitud_dc = df.isnull().any(axis=1).sum()
    return round((completitud_dc  / df.shape[0]) * 100, 2)

for key in dict_data.keys():
    print(f"\nDataset description: {key}")
    perc_completeness = completeness_df(dict_data[key])
    print(f"Filas que presentan nulos en el dataset [completitud_d]: {perc_completeness}%")


Dataset description: application_train
Filas que presentan nulos en el dataset [completitud_d]: 97.2%

Dataset description: application_test
Filas que presentan nulos en el dataset [completitud_d]: 96.43%

Dataset description: bureau
Filas que presentan nulos en el dataset [completitud_d]: 82.34%

Dataset description: bureau_balance
Filas que presentan nulos en el dataset [completitud_d]: 0.0%

Dataset description: previous_application
Filas que presentan nulos en el dataset [completitud_d]: 74.22%

Dataset description: POS_CASH_balance
Filas que presentan nulos en el dataset [completitud_d]: 0.26%

Dataset description: installments_payments
Filas que presentan nulos en el dataset [completitud_d]: 0.02%

Dataset description: credit_card_balance
Filas que presentan nulos en el dataset [completitud_d]: 21.51%


A nivel del dataset, se observan un porcentaje de filas mayor al 20% que presentan algún missing value

#### Dimensión de exactitud

Valores ajustados

##### Analizamos las columnas de tipo object.

In [24]:
#Visualization of columns of object type

for key in dict_data.keys():
    print(f"\nDataset description: {key}")
    display(dict_data[key].select_dtypes(include='object').head(5))
    


Dataset description: application_train


Unnamed: 0,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,NAME_TYPE_SUITE,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,OCCUPATION_TYPE,WEEKDAY_APPR_PROCESS_START,ORGANIZATION_TYPE,FONDKAPREMONT_MODE,HOUSETYPE_MODE,WALLSMATERIAL_MODE,EMERGENCYSTATE_MODE
0,Cash loans,M,N,Y,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,Laborers,WEDNESDAY,Business Entity Type 3,reg oper account,block of flats,"Stone, brick",No
1,Cash loans,F,N,N,Family,State servant,Higher education,Married,House / apartment,Core staff,MONDAY,School,reg oper account,block of flats,Block,No
2,Revolving loans,M,Y,Y,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,Laborers,MONDAY,Government,,,,
3,Cash loans,F,N,Y,Unaccompanied,Working,Secondary / secondary special,Civil marriage,House / apartment,Laborers,WEDNESDAY,Business Entity Type 3,,,,
4,Cash loans,M,N,Y,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,Core staff,THURSDAY,Religion,,,,



Dataset description: application_test


Unnamed: 0,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,NAME_TYPE_SUITE,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,OCCUPATION_TYPE,WEEKDAY_APPR_PROCESS_START,ORGANIZATION_TYPE,FONDKAPREMONT_MODE,HOUSETYPE_MODE,WALLSMATERIAL_MODE,EMERGENCYSTATE_MODE
0,Cash loans,F,N,Y,Unaccompanied,Working,Higher education,Married,House / apartment,,TUESDAY,Kindergarten,,block of flats,"Stone, brick",No
1,Cash loans,M,N,Y,Unaccompanied,Working,Secondary / secondary special,Married,House / apartment,Low-skill Laborers,FRIDAY,Self-employed,,,,
2,Cash loans,M,Y,Y,,Working,Higher education,Married,House / apartment,Drivers,MONDAY,Transport: type 3,,,,
3,Cash loans,F,N,Y,Unaccompanied,Working,Secondary / secondary special,Married,House / apartment,Sales staff,WEDNESDAY,Business Entity Type 3,reg oper account,block of flats,Panel,No
4,Cash loans,M,Y,N,Unaccompanied,Working,Secondary / secondary special,Married,House / apartment,,FRIDAY,Business Entity Type 3,,,,



Dataset description: bureau


Unnamed: 0,CREDIT_ACTIVE,CREDIT_CURRENCY,CREDIT_TYPE
0,Closed,currency 1,Consumer credit
1,Active,currency 1,Credit card
2,Active,currency 1,Consumer credit
3,Active,currency 1,Credit card
4,Active,currency 1,Consumer credit



Dataset description: bureau_balance


Unnamed: 0,STATUS
0,C
1,C
2,C
3,C
4,C



Dataset description: previous_application


Unnamed: 0,NAME_CONTRACT_TYPE,WEEKDAY_APPR_PROCESS_START,FLAG_LAST_APPL_PER_CONTRACT,NAME_CASH_LOAN_PURPOSE,NAME_CONTRACT_STATUS,NAME_PAYMENT_TYPE,CODE_REJECT_REASON,NAME_TYPE_SUITE,NAME_CLIENT_TYPE,NAME_GOODS_CATEGORY,NAME_PORTFOLIO,NAME_PRODUCT_TYPE,CHANNEL_TYPE,NAME_SELLER_INDUSTRY,NAME_YIELD_GROUP,PRODUCT_COMBINATION
0,Consumer loans,SATURDAY,Y,XAP,Approved,Cash through the bank,XAP,,Repeater,Mobile,POS,XNA,Country-wide,Connectivity,middle,POS mobile with interest
1,Cash loans,THURSDAY,Y,XNA,Approved,XNA,XAP,Unaccompanied,Repeater,XNA,Cash,x-sell,Contact center,XNA,low_action,Cash X-Sell: low
2,Cash loans,TUESDAY,Y,XNA,Approved,Cash through the bank,XAP,"Spouse, partner",Repeater,XNA,Cash,x-sell,Credit and cash offices,XNA,high,Cash X-Sell: high
3,Cash loans,MONDAY,Y,XNA,Approved,Cash through the bank,XAP,,Repeater,XNA,Cash,x-sell,Credit and cash offices,XNA,middle,Cash X-Sell: middle
4,Cash loans,THURSDAY,Y,Repairs,Refused,Cash through the bank,HC,,Repeater,XNA,Cash,walk-in,Credit and cash offices,XNA,high,Cash Street: high



Dataset description: POS_CASH_balance


Unnamed: 0,NAME_CONTRACT_STATUS
0,Active
1,Active
2,Active
3,Active
4,Active



Dataset description: installments_payments


0
1
2
3
4



Dataset description: credit_card_balance


Unnamed: 0,NAME_CONTRACT_STATUS
0,Active
1,Active
2,Active
3,Active
4,Active


Todas las columnas de object type son de tipo categórico. Estudiamos los posibles valores de cada una para identificar valores fuera de rango.

In [32]:
# Function to return unique values per each object column
def unique_values(df):
    unique_values = {}
    cols=list(df.select_dtypes(include='object'))
    if len(cols)!=0:
        for column in cols:
            unique_values[column] = [df[column].unique()]  # Convertir los valores únicos en una lista
        df_unique=pd.DataFrame(unique_values).T
        df_unique.columns = ['UNIQUE_VALUES']
        df_unique.index.name = 'ATTRIBUTE'
        return df_unique
    else:
        return "No columns of object type"
        #print("No columns of object type")


In [34]:
pd.set_option('display.max_colwidth', None)
for key in dict_data.keys():
    print(f"\nDataset description: {key}")
    display(unique_values(dict_data[key]))


Dataset description: application_train


Unnamed: 0_level_0,UNIQUE_VALUES
ATTRIBUTE,Unnamed: 1_level_1
NAME_CONTRACT_TYPE,"[Cash loans, Revolving loans]"
CODE_GENDER,"[M, F, XNA]"
FLAG_OWN_CAR,"[N, Y]"
FLAG_OWN_REALTY,"[Y, N]"
NAME_TYPE_SUITE,"[Unaccompanied, Family, Spouse, partner, Children, Other_A, nan, Other_B, Group of people]"
NAME_INCOME_TYPE,"[Working, State servant, Commercial associate, Pensioner, Unemployed, Student, Businessman, Maternity leave]"
NAME_EDUCATION_TYPE,"[Secondary / secondary special, Higher education, Incomplete higher, Lower secondary, Academic degree]"
NAME_FAMILY_STATUS,"[Single / not married, Married, Civil marriage, Widow, Separated, Unknown]"
NAME_HOUSING_TYPE,"[House / apartment, Rented apartment, With parents, Municipal apartment, Office apartment, Co-op apartment]"
OCCUPATION_TYPE,"[Laborers, Core staff, Accountants, Managers, nan, Drivers, Sales staff, Cleaning staff, Cooking staff, Private service staff, Medicine staff, Security staff, High skill tech staff, Waiters/barmen staff, Low-skill Laborers, Realty agents, Secretaries, IT staff, HR staff]"



Dataset description: application_test


Unnamed: 0_level_0,UNIQUE_VALUES
ATTRIBUTE,Unnamed: 1_level_1
NAME_CONTRACT_TYPE,"[Cash loans, Revolving loans]"
CODE_GENDER,"[F, M]"
FLAG_OWN_CAR,"[N, Y]"
FLAG_OWN_REALTY,"[Y, N]"
NAME_TYPE_SUITE,"[Unaccompanied, nan, Family, Spouse, partner, Group of people, Other_B, Children, Other_A]"
NAME_INCOME_TYPE,"[Working, State servant, Pensioner, Commercial associate, Businessman, Student, Unemployed]"
NAME_EDUCATION_TYPE,"[Higher education, Secondary / secondary special, Incomplete higher, Lower secondary, Academic degree]"
NAME_FAMILY_STATUS,"[Married, Single / not married, Civil marriage, Widow, Separated]"
NAME_HOUSING_TYPE,"[House / apartment, With parents, Rented apartment, Municipal apartment, Office apartment, Co-op apartment]"
OCCUPATION_TYPE,"[nan, Low-skill Laborers, Drivers, Sales staff, High skill tech staff, Core staff, Laborers, Managers, Accountants, Medicine staff, Security staff, Private service staff, Secretaries, Cleaning staff, Cooking staff, HR staff, Waiters/barmen staff, Realty agents, IT staff]"



Dataset description: bureau


Unnamed: 0_level_0,UNIQUE_VALUES
ATTRIBUTE,Unnamed: 1_level_1
CREDIT_ACTIVE,"[Closed, Active, Sold, Bad debt]"
CREDIT_CURRENCY,"[currency 1, currency 2, currency 4, currency 3]"
CREDIT_TYPE,"[Consumer credit, Credit card, Mortgage, Car loan, Microloan, Loan for working capital replenishment, Loan for business development, Real estate loan, Unknown type of loan, Another type of loan, Cash loan (non-earmarked), Loan for the purchase of equipment, Mobile operator loan, Interbank credit, Loan for purchase of shares (margin lending)]"



Dataset description: bureau_balance


Unnamed: 0_level_0,UNIQUE_VALUES
ATTRIBUTE,Unnamed: 1_level_1
STATUS,"[C, 0, X, 1, 2, 3, 5, 4]"



Dataset description: previous_application


Unnamed: 0_level_0,UNIQUE_VALUES
ATTRIBUTE,Unnamed: 1_level_1
NAME_CONTRACT_TYPE,"[Consumer loans, Cash loans, Revolving loans, XNA]"
WEEKDAY_APPR_PROCESS_START,"[SATURDAY, THURSDAY, TUESDAY, MONDAY, FRIDAY, SUNDAY, WEDNESDAY]"
FLAG_LAST_APPL_PER_CONTRACT,"[Y, N]"
NAME_CASH_LOAN_PURPOSE,"[XAP, XNA, Repairs, Everyday expenses, Car repairs, Building a house or an annex, Other, Journey, Purchase of electronic equipment, Medicine, Payments on other loans, Urgent needs, Buying a used car, Buying a new car, Buying a holiday home / land, Education, Buying a home, Furniture, Buying a garage, Business development, Wedding / gift / holiday, Hobby, Gasification / water supply, Refusal to name the goal, Money for a third person]"
NAME_CONTRACT_STATUS,"[Approved, Refused, Canceled, Unused offer]"
NAME_PAYMENT_TYPE,"[Cash through the bank, XNA, Non-cash from your account, Cashless from the account of the employer]"
CODE_REJECT_REASON,"[XAP, HC, LIMIT, CLIENT, SCOFR, SCO, XNA, VERIF, SYSTEM]"
NAME_TYPE_SUITE,"[nan, Unaccompanied, Spouse, partner, Family, Children, Other_B, Other_A, Group of people]"
NAME_CLIENT_TYPE,"[Repeater, New, Refreshed, XNA]"
NAME_GOODS_CATEGORY,"[Mobile, XNA, Consumer Electronics, Construction Materials, Auto Accessories, Photo / Cinema Equipment, Computers, Audio/Video, Medicine, Clothing and Accessories, Furniture, Sport and Leisure, Homewares, Gardening, Jewelry, Vehicles, Education, Medical Supplies, Other, Direct Sales, Office Appliances, Fitness, Tourism, Insurance, Additional Service, Weapon, Animals, House Construction]"



Dataset description: POS_CASH_balance


Unnamed: 0_level_0,UNIQUE_VALUES
ATTRIBUTE,Unnamed: 1_level_1
NAME_CONTRACT_STATUS,"[Active, Completed, Signed, Approved, Returned to the store, Demand, Canceled, XNA, Amortized debt]"



Dataset description: installments_payments


'No columns of object type'


Dataset description: credit_card_balance


Unnamed: 0_level_0,UNIQUE_VALUES
ATTRIBUTE,Unnamed: 1_level_1
NAME_CONTRACT_STATUS,"[Active, Completed, Demand, Signed, Sent proposal, Refused, Approved]"


Se observan los siguientes valores fuera de rango:
- application_train
    - CODE_GENDER: valor XNA
    - ORGANIZATION_TYPE: XNA

- bureau
    - STATUS: contiene mezcla de valores de numeros y letras. No obstante, según la descripción de las columnas son valores posibles.
- previous application
    - NAME_CONTRACT_TYPE: XNA
    - NAME_CASH_LOAN_PURPOSE: XAP?

Entre otras. El principal valor extraño que se observa es XNA. Se reemplazará por NA. Identificamos todos los atributos que contienen XNA en una lista

In [35]:
def attributes_containing_XNA(df):
    columns_with_val = []
    
    # Iterar por las columnas y verificar si el valor está en alguna de las filas
    for column in df.columns:
        if 'XNA' in df[column].values:
            columns_with_val.append(column)
    
    return columns_with_val

In [36]:
for key in dict_data.keys():
    print(f"\nDataset description: {key}")
    display(attributes_containing_XNA(dict_data[key]))


Dataset description: application_train


['CODE_GENDER', 'ORGANIZATION_TYPE']


Dataset description: application_test


['ORGANIZATION_TYPE']


Dataset description: bureau


[]


Dataset description: bureau_balance


[]


Dataset description: previous_application


['NAME_CONTRACT_TYPE',
 'NAME_CASH_LOAN_PURPOSE',
 'NAME_PAYMENT_TYPE',
 'CODE_REJECT_REASON',
 'NAME_CLIENT_TYPE',
 'NAME_GOODS_CATEGORY',
 'NAME_PORTFOLIO',
 'NAME_PRODUCT_TYPE',
 'NAME_SELLER_INDUSTRY',
 'NAME_YIELD_GROUP']


Dataset description: POS_CASH_balance


['NAME_CONTRACT_STATUS']


Dataset description: installments_payments


[]


Dataset description: credit_card_balance


[]

DTALE PARA EXPLORAR DATAFRAMES

In [None]:
for key in dict_data.keys():
    dtale_df = dt.show(dict_data[key])
    dtale_df.open_browser()

#### Dimensión de consistencia

claves_unicas


In [None]:
def claves_unicas(df, columns):
    total_filas = df.shape[0]
    porcentajes = []
    for column in columns:
        filas_duplicadas = df.duplicated(subset=[column]).sum()
        porcentaje_filas_duplicadas = (filas_duplicadas / total_filas) * 100
        porcentajes.append(porcentaje_filas_duplicadas)
    return porcentajes

In [None]:
for key in dict_data.keys():
    print(f"\nDataset description: {key}")
    porcentajes_claves_unicas = claves_unicas(ds_appTrain, c3)

In [None]:
c3 = ['SK_ID_CURR', 'DAYS_ID_PUBLISH']
porcentajes_claves_unicas = claves_unicas(ds_appTrain, c3)