# Sistema de Predicción de Aprobación de Tarjetas de Crédito usando Aprendizaje Automático

## 1. Introducción

## 2. Descripción del Problema

### 2.1 Análisis del Conjunto de Datos

In [29]:
import pandas as pd 

from os.path import join

In [14]:
data_folder = 'data'
application_df = pd.read_csv(join(data_folder, 'application_record.csv'))
credit_record_df = pd.read_csv(join(data_folder, 'credit_record.csv'))

In [15]:
application_df.head()

Unnamed: 0,ID,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,DAYS_BIRTH,DAYS_EMPLOYED,FLAG_MOBIL,FLAG_WORK_PHONE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS
0,5008804,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0
1,5008805,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0
2,5008806,M,Y,Y,0,112500.0,Working,Secondary / secondary special,Married,House / apartment,-21474,-1134,1,0,0,0,Security staff,2.0
3,5008808,F,N,Y,0,270000.0,Commercial associate,Secondary / secondary special,Single / not married,House / apartment,-19110,-3051,1,0,1,1,Sales staff,1.0
4,5008809,F,N,Y,0,270000.0,Commercial associate,Secondary / secondary special,Single / not married,House / apartment,-19110,-3051,1,0,1,1,Sales staff,1.0


In [16]:
credit_record_df.head()

Unnamed: 0,ID,MONTHS_BALANCE,STATUS
0,5001711,0,X
1,5001711,-1,0
2,5001711,-2,0
3,5001711,-3,0
4,5001712,0,C


#### 2.1.1 Tamaño del Conjunto de Datos

In [17]:
application_df.shape, credit_record_df.shape

((438557, 18), (1048575, 3))

In [30]:
## TODO: Buscar los valores duplicados y decidir qué hacer con ellos

#### 2.1.2 Variables del Conjunto de Datos

In [12]:
application_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 438557 entries, 0 to 438556
Data columns (total 18 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   ID                   438557 non-null  int64  
 1   CODE_GENDER          438557 non-null  object 
 2   FLAG_OWN_CAR         438557 non-null  object 
 3   FLAG_OWN_REALTY      438557 non-null  object 
 4   CNT_CHILDREN         438557 non-null  int64  
 5   AMT_INCOME_TOTAL     438557 non-null  float64
 6   NAME_INCOME_TYPE     438557 non-null  object 
 7   NAME_EDUCATION_TYPE  438557 non-null  object 
 8   NAME_FAMILY_STATUS   438557 non-null  object 
 9   NAME_HOUSING_TYPE    438557 non-null  object 
 10  DAYS_BIRTH           438557 non-null  int64  
 11  DAYS_EMPLOYED        438557 non-null  int64  
 12  FLAG_MOBIL           438557 non-null  int64  
 13  FLAG_WORK_PHONE      438557 non-null  int64  
 14  FLAG_PHONE           438557 non-null  int64  
 15  FLAG_EMAIL       

In [22]:
application_df.describe()

Unnamed: 0,ID,CNT_CHILDREN,AMT_INCOME_TOTAL,DAYS_BIRTH,DAYS_EMPLOYED,FLAG_MOBIL,FLAG_WORK_PHONE,FLAG_PHONE,FLAG_EMAIL,CNT_FAM_MEMBERS
count,438557.0,438557.0,438557.0,438557.0,438557.0,438557.0,438557.0,438557.0,438557.0,438557.0
mean,6022176.0,0.42739,187524.3,-15997.904649,60563.675328,1.0,0.206133,0.287771,0.108207,2.194465
std,571637.0,0.724882,110086.9,4185.030007,138767.799647,0.0,0.404527,0.452724,0.310642,0.897207
min,5008804.0,0.0,26100.0,-25201.0,-17531.0,1.0,0.0,0.0,0.0,1.0
25%,5609375.0,0.0,121500.0,-19483.0,-3103.0,1.0,0.0,0.0,0.0,2.0
50%,6047745.0,0.0,160780.5,-15630.0,-1467.0,1.0,0.0,0.0,0.0,2.0
75%,6456971.0,1.0,225000.0,-12514.0,-371.0,1.0,0.0,1.0,0.0,3.0
max,7999952.0,19.0,6750000.0,-7489.0,365243.0,1.0,1.0,1.0,1.0,20.0


In [31]:
# Encontrar los posibles valores que se usan en las columnas categóricas
for col in application_df.select_dtypes(include=['object']).columns:
    print(f'{col}: {application_df[col].unique()}')

CODE_GENDER: ['M' 'F']
FLAG_OWN_CAR: ['Y' 'N']
FLAG_OWN_REALTY: ['Y' 'N']
NAME_INCOME_TYPE: ['Working' 'Commercial associate' 'Pensioner' 'State servant' 'Student']
NAME_EDUCATION_TYPE: ['Higher education' 'Secondary / secondary special' 'Incomplete higher'
 'Lower secondary' 'Academic degree']
NAME_FAMILY_STATUS: ['Civil marriage' 'Married' 'Single / not married' 'Separated' 'Widow']
NAME_HOUSING_TYPE: ['Rented apartment' 'House / apartment' 'Municipal apartment'
 'With parents' 'Co-op apartment' 'Office apartment']
OCCUPATION_TYPE: [nan 'Security staff' 'Sales staff' 'Accountants' 'Laborers' 'Managers'
 'Drivers' 'Core staff' 'High skill tech staff' 'Cleaning staff'
 'Private service staff' 'Cooking staff' 'Low-skill Laborers'
 'Medicine staff' 'Secretaries' 'Waiters/barmen staff' 'HR staff'
 'Realty agents' 'IT staff']


- **CODE_GENDER**
  - M: Masculino
  - F: Femenino

- **FLAG_OWN_CAR**
  - Y: Propietario de un coche
  - N: No propietario de un coche

- **FLAG_OWN_REALTY**
  - Y: Propietario de una propiedad
  - N: No propietario de una propiedad

- **NAME_INCOME_TYPE**
  - Working: Trabajador
  - Commercial associate: Asociado comercial
  - State servant: Servidor público
  - Pensioner: Jubilado
  - Student: Estudiante

- **NAME_EDUCATION_TYPE**
  - Higher education: Educación superior
  - Secondary / secondary special: Secundaria / secundaria especial
  - Incomplete higher: Educación superior incompleta
  - Lower secondary: Secundaria inferior
  - Academic degree: Grado académico

- **NAME_FAMILY_STATUS**
  - Married: Casado
  - Single / not married: Soltero / no casado
  - Civil marriage: Matrimonio civil
  - Widow: Viudo
  - Separated: Separado

- **NAME_HOUSING_TYPE**
  - House / apartment: Casa / apartamento
  - With parents: Con padres
  - Co-op apartment: Apartamento cooperativo
  - Rented apartment: Apartamento alquilado
  - Municipal apartment: Apartamento municipal
  - Office apartment: Apartamento de oficina

- **OCCUPATION_TYPE**
  - [Empty]: Ocupación vacía
  - Laborers: Trabajadores
  - Sales staff: Personal de ventas
  - Managers: Gerentes
  - Drivers: Conductores
  - High skill tech staff: Personal técnico altamente cualificado
  - Accountants: Contadores
  - Medicine staff: Personal médico
  - Security staff: Personal de seguridad
  - Cleaning staff: Personal de limpieza
  - Cooking staff: Personal de cocina
  - Waiters/barmen staff: Personal de servicio de mesa/barra
  - Low-skill Laborers: Trabajadores no cualificados
  - Core staff: Personal básico
  - Private service staff: Personal de servicio privado
  - Secretaries: Secretarias
  - HR staff: Personal de recursos humanos
  - Reality agents: Agentes inmobiliarios
  - IT staff: Personal de TI


In [26]:
credit_record_df.describe()

Unnamed: 0,ID,MONTHS_BALANCE
count,1048575.0,1048575.0
mean,5068286.0,-19.137
std,46150.58,14.0235
min,5001711.0,-60.0
25%,5023644.0,-29.0
50%,5062104.0,-17.0
75%,5113856.0,-7.0
max,5150487.0,0.0


In [33]:
# Encontrar los posibles valores que se usan en las columnas categóricas
for col in credit_record_df.select_dtypes(include=['object']).columns:
    print(f'{col}: {credit_record_df[col].unique()}')

STATUS: ['X' '0' 'C' '1' '2' '3' '4' '5']


- **STATUS**
  - 0: 1-29 días de retraso
  - 1: 30-59 días de retraso
  - 2: 60-89 días de retraso
  - 3: 90-119 días de retraso
  - 4: 120-149 días de retraso
  - 5: Retraso o deudas incobrables, cancelaciones por más de 150 días
  - C: Pagado este mes
  - X: Sin préstamo para el mes


### 2.2 Imputación de Datos Faltantes

In [27]:
application_df.isna().sum()

ID                          0
CODE_GENDER                 0
FLAG_OWN_CAR                0
FLAG_OWN_REALTY             0
CNT_CHILDREN                0
AMT_INCOME_TOTAL            0
NAME_INCOME_TYPE            0
NAME_EDUCATION_TYPE         0
NAME_FAMILY_STATUS          0
NAME_HOUSING_TYPE           0
DAYS_BIRTH                  0
DAYS_EMPLOYED               0
FLAG_MOBIL                  0
FLAG_WORK_PHONE             0
FLAG_PHONE                  0
FLAG_EMAIL                  0
OCCUPATION_TYPE        134203
CNT_FAM_MEMBERS             0
dtype: int64