<a href="https://colab.research.google.com/github/DavidUpegui/Home_Loan_approval_ML/blob/main/Evaluaci%C3%B3n_inteligente_de_cr%C3%A9ditos_hipotecarios.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Sé mi aprobación hipotecaria: Evaluación inteligente de créditos hipotecarios

##Inicialización

In [40]:
#Importar e instalar librerías
import pandas as pd
import numpy as np

In [41]:
#Para importar datos de Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [46]:
#Importar la data
path = '/content/drive/MyDrive/Modelos II/Proyecto final/data/' #Change this path for your own path
raw_data = pd.read_csv(path + 'loan_sanction_train.csv')
raw_data.head(5)

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


##Limpieza de datos


###Eliminación de columnas innecesarias

La columna ```Loan_id``` es una columna que no aporta información importante en la creación del modelo, por lo tanto esta es removida de la base de datos con la que trabajaremos

In [47]:
raw_data = raw_data.drop('Loan_ID', axis=1)
raw_data.head(5)

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


###Imputación de fatos faltantes

Dado que para la presente actividad es de mayor importancia la creación del modelo y no el tratamiento de los datos, se usarán métodos simples para la imputación de datos faltantes:
- Para las columnas categóricas se usará la moda.
- Para las columnas numéricas se usará la media.

In [48]:
from sklearn.impute import SimpleImputer

#Para las columnas categóricas
categorical_cols = ['Gender','Married','Dependents','Education', 'Self_Employed', 'Loan_Amount_Term', 'Credit_History', 'Property_Area']
cualitative_imputer = SimpleImputer(strategy='most_frequent')
for col in categorical_cols:
  raw_data[col] = cualitative_imputer.fit_transform(raw_data[[col]])

#Para las columnas numéricas
cuantitative_cols = ['ApplicantIncome','CoapplicantIncome', 'LoanAmount']
cuantitative_imputer = SimpleImputer(strategy='mean')
for col in cuantitative_cols:
  raw_data[col] = cuantitative_imputer.fit_transform(raw_data[[col]])

###Codificación de columnas categóricas

In [49]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
categorical_cols = ['Gender','Married','Dependents','Education', 'Self_Employed'
  , 'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status']
print('CODIFICATION MADE BY LabelEncoder: \n')
for col in categorical_cols:
  raw_data[col] = le.fit_transform(raw_data[col])
  raw_data[col] = raw_data[col]
  label_mapping = dict(zip(le.classes_, le.transform(le.classes_)))
  print('Codification of column ' + col + ':\n' + str(label_mapping) +'\n')


'''
Usando LabelEncoder casi todas las columnas bien codificadas, a exepción de "Education",
en donde lo ideal es que "Not Graduated" : 0 y "Graduated": 1, por lo tanto ahora se hace
el cambio manualmente.
'''

raw_data['Education'] = raw_data['Education'].replace({0: 1, 1: 0})

CODIFICATION MADE BY LabelEncoder: 

Codification of column Gender:
{'Female': 0, 'Male': 1}

Codification of column Married:
{'No': 0, 'Yes': 1}

Codification of column Dependents:
{'0': 0, '1': 1, '2': 2, '3+': 3}

Codification of column Education:
{'Graduate': 0, 'Not Graduate': 1}

Codification of column Self_Employed:
{'No': 0, 'Yes': 1}

Codification of column Loan_Amount_Term:
{12.0: 0, 36.0: 1, 60.0: 2, 84.0: 3, 120.0: 4, 180.0: 5, 240.0: 6, 300.0: 7, 360.0: 8, 480.0: 9}

Codification of column Credit_History:
{0.0: 0, 1.0: 1}

Codification of column Property_Area:
{'Rural': 0, 'Semiurban': 1, 'Urban': 2}

Codification of column Loan_Status:
{'N': 0, 'Y': 1}



In [50]:
raw_data

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,1,0,0,1,0,5849.0,0.0,146.412162,8,1,2,1
1,1,1,1,1,0,4583.0,1508.0,128.000000,8,1,0,0
2,1,1,0,1,1,3000.0,0.0,66.000000,8,1,2,1
3,1,1,0,0,0,2583.0,2358.0,120.000000,8,1,2,1
4,1,0,0,1,0,6000.0,0.0,141.000000,8,1,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...
609,0,0,0,1,0,2900.0,0.0,71.000000,8,1,0,1
610,1,1,3,1,0,4106.0,0.0,40.000000,5,1,0,1
611,1,1,1,1,0,8072.0,240.0,253.000000,8,1,2,1
612,1,1,2,1,0,7583.0,0.0,187.000000,8,1,2,1


Al finalizar con el tratamiento de los datos obtenemos una base de datos con las siguientes características:

####Columnas:

Entiendo que deseas representar la estructura de una tabla utilizando LaTeX. Aquí tienes un ejemplo de cómo escribir la estructura de una tabla en formato LaTeX:

| **Index** | **Nombre** | **Tipo de Dato** |Tipo de Variable| **Valores** |
|-----------|------------|-------------------|-|-------------|
| 0         | Gender   | int |Discreta| **0**: 'Female' , **1**: 'Male'   |
| 1         | Married   | int  |Discreta| **0**: 'No', **1**:'Yes' |
| 2         | Dependents   | int   |Discreta| **0**: '0', **1**: '1', **2**: '2', **3**: '3+' |
| 3         | Education   | int    |Discreta| **0**: 'Not Graduated', **1**: 'Graduated'  |
| 4         | Self_Employed   | int   |Continua|  ℝ ≥ 0 |
| 5         | ApplicantIncome   | int   |Continua| ℝ ≥ 0 |
| 6         | CoapplicantIncome   | int |Continua| ℝ ≥ 0 |
| 7         | LoanAmount   | int  |Continua| ℝ ≥ 0 |
| 8         | Loan_Amount_Term   | int   |Discreta| **0**: 12, **1**: 36, **2**: 60, **3**: 84, **4**: 120, **5**: 180, **6**: 240, **7**: 300, **8**: 360, **9**: 480 |
| 9         | Credit_History   | int   |Discreta| **0**: 0, **1**: 1 |
| 10         | Property_Area   | int |Discreta|**0**: 'Rural', **1**: 'Semiurban', **2**: 'Urban' |
| 11        | Loan_Status   | int |Discreta| **0**: 'N', **Y**: 1 |

_**Nota**: En la columna de valores se encuentran en negrilla los valores en la base de datos después de codificar, a la derecha se encuentras los valores correspondientes con la base de datos original_



