# Modelos de ML

En esta sección se cargaran todos los datos que han sido transformados y limpiados para realizar feature engineering, optimización de hiperparametros y reducción de dimensionalidad.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

## Carga de los datos

In [4]:
df = pd.read_csv("dataClean_new.csv", sep = ",")

In [5]:
df.head()

Unnamed: 0,fecha_nacimiento,genero,ult_actual,ind_mora_vigente,cupo_total_tc,tenencia_tc,tiene_ctas_activas,ingreso_final,saldo_no_rot_mdo,cant_oblig_tot_sf,...,ocupacion_jubilado,ocupacion_otro,ocupacion_pensionado,ocupacion_profesional independiente,ocupacion_rentista de capital,ocupacion_socio empleado - socio,tipo_vivienda_familiar,tipo_vivienda_no informa,tipo_vivienda_propia,cuotas_a_pagar
0,19840630,1,20180526,0,0.0,0,1,1391032.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0.0
1,19860727,1,20181120,0,0.0,0,1,2327500.0,0.0,0.0,...,0,0,0,0,0,0,1,0,0,0.0
2,19910108,1,20190802,0,0.0,1,1,6519750.0,0.0,0.0,...,0,0,0,0,0,0,0,1,0,0.0
3,19900903,1,20190906,0,0.0,0,1,1484205.0,2555000.0,0.0,...,0,0,0,0,0,0,0,1,0,0.0
4,19790623,0,20191211,0,0.0,0,1,4353334.0,211000.0,4.0,...,0,0,0,0,0,0,0,1,0,0.0


## Selección de modelos

Para este analisis se realizaran las comparaciones de performance utilizando 3 modelos:

1. Gradient Boosting tree
2. Random Forest
3. Support vector machine
4. Neural networks

In [6]:
X = df.drop(["gasto_familiar"], axis = 1)
Y = df["gasto_familiar"]

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2,random_state=7)

## Modelo sencillo para analisis de features importances

In [8]:
from sklearn.linear_model import ElasticNetCV

ENReg = ElasticNetCV(random_state = 0, verbose = 2)
ENReg.fit(X_train, y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Path: 000 out of 100
Path: 001 out of 100
Path: 002 out of 100
Path: 003 out of 100
Path: 004 out of 100
Path: 005 out of 100
Path: 006 out of 100
Path: 007 out of 100
Path: 008 out of 100
Path: 009 out of 100
Path: 010 out of 100
Path: 011 out of 100
Path: 012 out of 100
Path: 013 out of 100
Path: 014 out of 100
Path: 015 out of 100
Path: 016 out of 100
Path: 017 out of 100
Path: 018 out of 100
Path: 019 out of 100
Path: 020 out of 100
Path: 021 out of 100
Path: 022 out of 100
Path: 023 out of 100
Path: 024 out of 100
Path: 025 out of 100
Path: 026 out of 100
Path: 027 out of 100
Path: 028 out of 100
Path: 029 out of 100
Path: 030 out of 100
Path: 031 out of 100
Path: 032 out of 100
Path: 033 out of 100
Path: 034 out of 100
Path: 035 out of 100
Path: 036 out of 100
Path: 037 out of 100
Path: 038 out of 100
Path: 039 out of 100
Path: 040 out of 100
Path: 041 out of 100
Path: 042 out of 100
Path: 043 out of 100
Path: 044 out of 100
Path: 045 out of 100
Path: 046 out of 100
Path: 047 out

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.7s remaining:    0.0s


Path: 000 out of 100
Path: 001 out of 100
Path: 002 out of 100
Path: 003 out of 100
Path: 004 out of 100
Path: 005 out of 100
Path: 006 out of 100
Path: 007 out of 100
Path: 008 out of 100
Path: 009 out of 100
Path: 010 out of 100
Path: 011 out of 100
Path: 012 out of 100
Path: 013 out of 100
Path: 014 out of 100
Path: 015 out of 100
Path: 016 out of 100
Path: 017 out of 100
Path: 018 out of 100
Path: 019 out of 100
Path: 020 out of 100
Path: 021 out of 100
Path: 022 out of 100
Path: 023 out of 100
Path: 024 out of 100
Path: 025 out of 100
Path: 026 out of 100
Path: 027 out of 100
Path: 028 out of 100
Path: 029 out of 100
Path: 030 out of 100
Path: 031 out of 100
Path: 032 out of 100
Path: 033 out of 100
Path: 034 out of 100
Path: 035 out of 100
Path: 036 out of 100
Path: 037 out of 100
Path: 038 out of 100
Path: 039 out of 100
Path: 040 out of 100
Path: 041 out of 100
Path: 042 out of 100
Path: 043 out of 100
Path: 044 out of 100
Path: 045 out of 100
Path: 046 out of 100
Path: 047 out

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    3.5s finished


ElasticNetCV(random_state=0, verbose=2)

In [9]:
ENReg.score(X_test, y_test)

0.08002537139443688

In [10]:
def mean_absolute_percentage_error2(y_pred, y_true):
    y_true = np.where(y_true == 0, 0.0000000001, y_true)
    return np.mean(np.abs((y_true - y_pred) / y_true))

In [11]:
y_pred = ENReg.predict(X_test)

In [12]:
mape = mean_absolute_percentage_error2(y_pred, y_test)
mape

119.43028575751624