# Análisis: seguro de vehículos


Una empresa de seguros de salud quiere ofrecer a sus antiguos clientes un nuevo seguro de vehículos. Necesita nuestro servicio para crear un modelo que prediga si un cliente estaría interesado en este nuevo seguro.

Tenemos un conjunto de datos con los siguientes valores:

| Variable | Definición |
| --- | --- |
| id | Identificador único |
| Gender | Género del cliente (M/F) |
| Age | Edad del cliente |
| Driving_License | El cliente tiene carnet de conducir (1/0) |
| Region_Code | Código de la región del cliente |
| Previously_Insured | El cliente ya tiene seguro de coche (1/0) |
| Vehicle_Age | Años del vehículo |
| Vehicle_Damage | El cliente ha sufrido daños en su vehículo anteriormente (1/0) |
| Annual_Premium | Cantidad a pagar por el nuevo seguro |
| Plocy_Sales_Channel | Canal por el que se localiza al cliente (e-mail, teléfono, en persona, etc) |
| Vintage | Número de días del cliente con la compañía |
| Response | Respuesta si o no (1/0) |

### Problema de clasificación

## Importando Dataset

Importamos librerías necesarias y cargamos los dataset de entrenamiento y prueba desde la carpeta /res

In [71]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.compose import ColumnTransformer

train = pd.read_csv('../res/train.csv')
x_train = train.iloc[:, 1:-1] # Training Dataset without dependient variable and index (pandas index = dataset index - 1)
y_train = train.iloc[:, -1] # Training Dependient variable

test = pd.read_csv('../res/test.csv') # Test Dataset
x_test = test.iloc[:, 1:-1] # Test Dataset without dependient variable and index (pandas index = dataset index - 1)
y_test = test.iloc[:, -1] # Test Dependient variable

display(x_train)

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage
0,Male,44,1,28.0,0,> 2 Years,Yes,40454.0,26.0,217
1,Male,76,1,3.0,0,1-2 Year,No,33536.0,26.0,183
2,Male,47,1,28.0,0,> 2 Years,Yes,38294.0,26.0,27
3,Male,21,1,11.0,1,< 1 Year,No,28619.0,152.0,203
4,Female,29,1,41.0,1,< 1 Year,No,27496.0,152.0,39
...,...,...,...,...,...,...,...,...,...,...
381104,Male,74,1,26.0,1,1-2 Year,No,30170.0,26.0,88
381105,Male,30,1,37.0,1,< 1 Year,No,40016.0,152.0,131
381106,Male,21,1,30.0,1,< 1 Year,No,35118.0,160.0,161
381107,Female,68,1,14.0,0,> 2 Years,Yes,44617.0,124.0,74


## Preprocesado y limpieza del Dataset

### Valores categóricos

Convertimos los valores categóricos en valores booleanos.

In [72]:
# Generate boolean values for categorical columns 
le = LabelEncoder()
x_train = pd.get_dummies(x_train, columns=['Gender', 'Vehicle_Age'], prefix=['Gender', 'Vehicle_Age'])
x_train['Vehicle_Damage'] = le.fit_transform(x_train['Vehicle_Damage']) # Yes -> 1 | No -> 0

# Reordering columns so categorical data are the last columns
new_cols_order = ['Age', 'Region_Code', 'Annual_Premium', 'Policy_Sales_Channel', 'Vintage', 'Gender_Female', 'Gender_Male', 'Vehicle_Age_1-2 Year', 'Vehicle_Age_< 1 Year', 'Vehicle_Age_> 2 Years', 'Driving_License', 'Previously_Insured', 'Vehicle_Damage']

x_train = x_train[new_cols_order]


display(x_train)

Unnamed: 0,Age,Region_Code,Annual_Premium,Policy_Sales_Channel,Vintage,Gender_Female,Gender_Male,Vehicle_Age_1-2 Year,Vehicle_Age_< 1 Year,Vehicle_Age_> 2 Years,Driving_License,Previously_Insured,Vehicle_Damage
0,44,28.0,40454.0,26.0,217,0,1,0,0,1,1,0,1
1,76,3.0,33536.0,26.0,183,0,1,1,0,0,1,0,0
2,47,28.0,38294.0,26.0,27,0,1,0,0,1,1,0,1
3,21,11.0,28619.0,152.0,203,0,1,0,1,0,1,1,0
4,29,41.0,27496.0,152.0,39,1,0,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
381104,74,26.0,30170.0,26.0,88,0,1,1,0,0,1,1,0
381105,30,37.0,40016.0,152.0,131,0,1,0,1,0,1,1,0
381106,21,30.0,35118.0,160.0,161,0,1,0,1,0,1,1,0
381107,68,14.0,44617.0,124.0,74,1,0,0,0,1,1,0,1


### Estandarización

Estandarizamos los valores continuos para no confudir los futuros análisis

In [73]:
# Continous variables standarization
sc = StandardScaler()
x_train.iloc[:, :6] = sc.fit_transform(x_train.iloc[:, :6])
display(x_train)

Unnamed: 0,Age,Region_Code,Annual_Premium,Policy_Sales_Channel,Vintage,Gender_Female,Gender_Male,Vehicle_Age_1-2 Year,Vehicle_Age_< 1 Year,Vehicle_Age_> 2 Years,Driving_License,Previously_Insured,Vehicle_Damage
0,0.333777,0.121784,0.574539,-1.587234,0.748795,-0.921545,1,0,0,1,1,0,1
1,2.396751,-1.767879,0.172636,-1.587234,0.342443,-0.921545,1,1,0,0,1,0,0
2,0.527181,0.121784,0.449053,-1.587234,-1.521998,-0.921545,1,0,0,1,1,0,1
3,-1.148985,-1.163187,-0.113018,0.737321,0.581474,-0.921545,1,0,1,0,1,1,0
4,-0.633242,1.104409,-0.178259,0.737321,-1.378580,1.085134,0,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
381104,2.267815,-0.029389,-0.022912,-1.587234,-0.792954,-0.921545,1,1,0,0,1,1,0
381105,-0.568774,0.802063,0.549093,0.737321,-0.279037,-0.921545,1,0,1,0,1,1,0
381106,-1.148985,0.272958,0.264543,0.884912,0.079509,-0.921545,1,0,1,0,1,1,0
381107,1.881007,-0.936427,0.816389,0.220753,-0.960275,1.085134,0,0,0,1,1,0,1


In [42]:



display(x_train)

ct = ColumnTransformer(transformers=(['encoder', OneHotEncoder(), [6]]), remainder = 'passthrough')
#x_cttransform = np.array(ct.fit_transform(train))

#display(x_cttransform)

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage
0,1,Male,44,1,28.0,0,> 2 Years,Yes,40454.0,26.0,217
1,2,Male,76,1,3.0,0,1-2 Year,No,33536.0,26.0,183
2,3,Male,47,1,28.0,0,> 2 Years,Yes,38294.0,26.0,27
3,4,Male,21,1,11.0,1,< 1 Year,No,28619.0,152.0,203
4,5,Female,29,1,41.0,1,< 1 Year,No,27496.0,152.0,39
...,...,...,...,...,...,...,...,...,...,...,...
381104,381105,Male,74,1,26.0,1,1-2 Year,No,30170.0,26.0,88
381105,381106,Male,30,1,37.0,1,< 1 Year,No,40016.0,152.0,131
381106,381107,Male,21,1,30.0,1,< 1 Year,No,35118.0,160.0,161
381107,381108,Female,68,1,14.0,0,> 2 Years,Yes,44617.0,124.0,74


Unnamed: 0,id,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Gender_Female,Gender_Male,Vehicle_Age_1-2 Year,Vehicle_Age_< 1 Year,Vehicle_Age_> 2 Years
0,1,44,1,28.0,0,Yes,40454.0,26.0,217,0,1,0,0,1
1,2,76,1,3.0,0,No,33536.0,26.0,183,0,1,1,0,0
2,3,47,1,28.0,0,Yes,38294.0,26.0,27,0,1,0,0,1
3,4,21,1,11.0,1,No,28619.0,152.0,203,0,1,0,1,0
4,5,29,1,41.0,1,No,27496.0,152.0,39,1,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
381104,381105,74,1,26.0,1,No,30170.0,26.0,88,0,1,1,0,0
381105,381106,30,1,37.0,1,No,40016.0,152.0,131,0,1,0,1,0
381106,381107,21,1,30.0,1,No,35118.0,160.0,161,0,1,0,1,0
381107,381108,68,1,14.0,0,Yes,44617.0,124.0,74,1,0,0,0,1


## Análisis predictivo

### Análisis logístico