<a href="https://colab.research.google.com/github/MBrandao07/Case_Porto_Seguro_Kaggle/blob/main/Case_Porto_Seguro_Data_Prep.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Entendimento do problema de Negócio**
   Construir um **modelo que preveja a probabilidade de um motorista fazer um sinistro de seguro auto no próximo ano**

**"Sinistro"** é um termo usado no setor de seguros para se referir a um evento em que o segurado sofre um prejuízo coberto pela sua apólice e, consequentemente, faz uma reclamação ou solicitação de indenização à seguradora. No contexto de seguro auto, um sinistro pode envolver situações como:

1. Acidentes de trânsito (colisões, capotagens, etc.);
2. Roubo ou furto do veículo;
3. Danos causados por fenômenos naturais (enchentes, granizo, etc.);
4. Danos causados por terceiros (vandalismo, por exemplo);
5. Entre outros.

Após a ocorrência de um sinistro, o segurado deve entrar em contato com a seguradora para informar o ocorrido e iniciar o processo de avaliação e eventual indenização, conforme estabelecido na apólice de seguro.   

# **Entendimento dos Dados**

Para este projeto utilizaremos os dados disponibilizados pela seguradora Porto Seguro no ambiente Kaggle:

https://www.kaggle.com/competitions/porto-seguro-safe-driver-prediction/data

Nos dados de treino e teste:
- variáveis que pertencem a agrupamentos similares são marcadas como tal nos nomes das variáveis (por exemplo, ind, reg, car, calc).

- Os nomes das variáveis incluem o sufixo "bin" para indicar variáveis binárias e "cat" para indicar varipaveis categóricas. Variáveis sem essas designações são contínuas ou ordinais.

- Valores de -1 indicam nulo.

- A coluna "target" indica se uma reclamação (sinistro) foi feita ou não para aquele titular da apólice.

O arquivo train.csv contém os dados de treino, onde cada linha corresponde a um titular de apólice, e a coluna "target" indica que uma reclamação foi feita.

O arquivo test.csv contém os dados de teste.



In [None]:
import pandas as pd
import numpy as np
import pickle
from sklearn.model_selection import train_test_split
import pod_academy_functions as pod # a biblioteca pod_academy_functions é uma biblioteca com várias funções criada durante o curso da PoD Academy

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# verificando todas as funções na biblioteca pod_academy_functions
with open('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/pod_academy_functions.py', 'r') as f:
    print(f.read())

# Função para cálculo do Gini normalizado

def gini_normalizado(actual, pred, cmpcol = 0, sortcol = 1):
    import numpy as np
    assert( len(actual) == len(pred) )
    all = np.asarray(np.c_[ actual, pred, np.arange(len(actual)) ], dtype=np.float)
    all = all[ np.lexsort((all[:,2], -1*all[:,1])) ]
    totalLosses = all[:,0].sum()
    giniSum = all[:,0].cumsum().sum() / totalLosses

    giniSum -= (len(actual) + 1) / 2.
    return giniSum / len(actual)

def pod_academy_generate_metadata(dataframe):
    import pandas as pd
    """
    Gera um dataframe contendo metadados das colunas do dataframe fornecido.

    :param dataframe: DataFrame para o qual os metadados serão gerados.
    :return: DataFrame contendo metadados.
    """

    # Coleta de metadados básicos
    metadata = pd.DataFrame({
        'nome_variavel': dataframe.columns,
        'tipo': dataframe.dtypes,
        'qt_nulos': dataframe.isnull().sum(),
        'percent_nulos': round((dataframe.isnull().sum() / len(datafram

In [None]:
df_train_00 = pd.read_csv('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/train.csv')
df_train_00.shape

(595212, 59)

In [None]:
df_test_00 = pd.read_csv('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/test.csv')
df_test_00.shape

(892816, 58)

In [None]:
# separando os dados em 70% dos dados para treino e 30% para validação
train, test = train_test_split(df_train_00, test_size=0.3, random_state=42)
train.shape,test.shape

((416648, 59), (178564, 59))

In [None]:
train.head()

Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
391389,977816,0,7,1,9,1,0,0,0,1,...,5,2,4,10,0,1,1,0,0,1
518243,1294939,0,5,1,3,1,0,0,1,0,...,3,1,6,12,0,1,1,0,1,0
136933,342083,0,0,1,6,1,0,1,0,0,...,4,3,4,10,0,0,1,0,1,0
432345,1080386,0,0,1,4,1,0,1,0,0,...,5,3,4,8,0,0,1,0,1,0
127021,317567,1,1,1,2,0,0,0,0,0,...,8,1,4,5,0,1,1,1,1,0


In [None]:
# criando uma cópia do df base
df_train_01 = train.copy()

#### Já que os valores "-1" na base são nulos, então vamos transforma-los em nulos

In [None]:
# substituindo onde tiver "-1" por nulo
df_train_01.replace(-1, np.nan, inplace=True)
df_train_01.head()

Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
391389,977816,0,7,1.0,9,1.0,0.0,0,0,1,...,5,2,4,10,0,1,1,0,0,1
518243,1294939,0,5,1.0,3,1.0,0.0,0,1,0,...,3,1,6,12,0,1,1,0,1,0
136933,342083,0,0,1.0,6,1.0,0.0,1,0,0,...,4,3,4,10,0,0,1,0,1,0
432345,1080386,0,0,1.0,4,1.0,0.0,1,0,0,...,5,3,4,8,0,0,1,0,1,0
127021,317567,1,1,1.0,2,0.0,0.0,0,0,0,...,8,1,4,5,0,1,1,1,1,0


#### Gerando os metadados da base de treino

In [None]:
# gerando os metadados do dataframe
metadata_df = pod.pod_academy_generate_metadata(df_train_01,
                                          ids=['id'],
                                          targets=['target'],
                                          orderby = 'PC_NULOS')
metadata_df.head(10)

Unnamed: 0,FEATURE,USO_FEATURE,QT_NULOS,PC_NULOS,CARDINALIDADE,TIPO_FEATURE
0,ps_car_03_cat,Explicativa,287957,69.11,2,float64
1,ps_car_05_cat,Explicativa,186779,44.83,2,float64
2,ps_reg_03,Explicativa,75228,18.06,4965,float64
3,ps_car_14,Explicativa,29746,7.14,831,float64
4,ps_car_07_cat,Explicativa,8018,1.92,2,float64
5,ps_ind_05_cat,Explicativa,4057,0.97,7,float64
6,ps_car_09_cat,Explicativa,393,0.09,5,float64
7,ps_ind_02_cat,Explicativa,139,0.03,4,float64
8,ps_car_01_cat,Explicativa,69,0.02,12,float64
9,ps_ind_04_cat,Explicativa,55,0.01,2,float64


#### Removendo as colunas com alto percentual de nulos

In [None]:
missing_cutoff = 65

drop_vars_nulos = metadata_df[(metadata_df['PC_NULOS'] >= missing_cutoff)]
lista_drop_vars = list(drop_vars_nulos.FEATURE.values)

print(f'Variáveis que serão excluídas por terem mais de {missing_cutoff}% de nulos: ',lista_drop_vars)

# retirando a lista de variáveis com alto percentual de nulos do df
df_train_02 = df_train_01.drop(axis=1,columns=lista_drop_vars)
df_train_02.shape

Variáveis que serão excluídas por terem mais de 65% de nulos:  ['ps_car_03_cat']


(416648, 58)

In [None]:
# Salvar a lista de variáveis removidas em um arquivo .pkl
with open('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/prd_drop_nullvars.pkl', 'wb') as f:
    pickle.dump(lista_drop_vars, f)

In [None]:
df_train_02.head()

Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
391389,977816,0,7,1.0,9,1.0,0.0,0,0,1,...,5,2,4,10,0,1,1,0,0,1
518243,1294939,0,5,1.0,3,1.0,0.0,0,1,0,...,3,1,6,12,0,1,1,0,1,0
136933,342083,0,0,1.0,6,1.0,0.0,1,0,0,...,4,3,4,10,0,0,1,0,1,0
432345,1080386,0,0,1.0,4,1.0,0.0,1,0,0,...,5,3,4,8,0,0,1,0,1,0
127021,317567,1,1,1.0,2,0.0,0.0,0,0,0,...,8,1,4,5,0,1,1,1,1,0


In [None]:
# verificando se existem algum id nulo
df_train_02['id'].isnull().sum()

np.int64(0)

In [None]:
# removendo as colunas id e target do df
df_train_02 = df_train_02.drop(axis=1, columns=['id', 'target'])
df_train_02.head()

Unnamed: 0,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,ps_ind_09_bin,ps_ind_10_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
391389,7,1.0,9,1.0,0.0,0,0,1,0,0,...,5,2,4,10,0,1,1,0,0,1
518243,5,1.0,3,1.0,0.0,0,1,0,0,0,...,3,1,6,12,0,1,1,0,1,0
136933,0,1.0,6,1.0,0.0,1,0,0,0,0,...,4,3,4,10,0,0,1,0,1,0
432345,0,1.0,4,1.0,0.0,1,0,0,0,0,...,5,3,4,8,0,0,1,0,1,0
127021,1,1.0,2,0.0,0.0,0,0,0,1,0,...,8,1,4,5,0,1,1,1,1,0


Verificando os tipos de todas as colunas para entender que tratamento devemos fazer

In [None]:
df_train_02.dtypes.value_counts()

Unnamed: 0,count
int64,37
float64,19


#### Temos somente colunas numéricas, então vamos realizar a substituição dos nulos pela média em cada coluna

In [None]:
# substituindo os valores pelas médias e salvando em um arquivo
df_train_03, means = pod.pod_custom_fillna(df_train_02)

with open('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/prd_fillna.pkl', 'wb') as f:
  pickle.dump(means, f)

In [None]:
df_train_03.head()

Unnamed: 0,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,ps_ind_09_bin,ps_ind_10_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
391389,7,1.0,9,1.0,0.0,0,0,1,0,0,...,5,2,4,10,0,1,1,0,0,1
518243,5,1.0,3,1.0,0.0,0,1,0,0,0,...,3,1,6,12,0,1,1,0,1,0
136933,0,1.0,6,1.0,0.0,1,0,0,0,0,...,4,3,4,10,0,0,1,0,1,0
432345,0,1.0,4,1.0,0.0,1,0,0,0,0,...,5,3,4,8,0,0,1,0,1,0
127021,1,1.0,2,0.0,0.0,0,0,0,1,0,...,8,1,4,5,0,1,1,1,1,0


In [None]:
# verificando a média utilizada em cada variável
with open('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/prd_fillna.pkl', 'rb') as f:
  loaded_means = pickle.load(f)
loaded_means

{'ps_ind_01': np.float64(1.902154816535781),
 'ps_ind_02_cat': np.float64(1.359569661159783),
 'ps_ind_03': np.float64(4.4241902037211265),
 'ps_ind_04_cat': np.float64(0.4168841051097834),
 'ps_ind_05_cat': np.float64(0.41899605178009214),
 'ps_ind_06_bin': np.float64(0.39382644342466544),
 'ps_ind_07_bin': np.float64(0.2568019047253317),
 'ps_ind_08_bin': np.float64(0.16375453620322192),
 'ps_ind_09_bin': np.float64(0.185617115646781),
 'ps_ind_10_bin': np.float64(0.0003672164512970181),
 'ps_ind_11_bin': np.float64(0.0016608744071734413),
 'ps_ind_12_bin': np.float64(0.009283615905992589),
 'ps_ind_13_bin': np.float64(0.0009048405368560511),
 'ps_ind_14': np.float64(0.0122165473013191),
 'ps_ind_15': np.float64(7.298587776732398),
 'ps_ind_16_bin': np.float64(0.6603967857760028),
 'ps_ind_17_bin': np.float64(0.1215630460244619),
 'ps_ind_18_bin': np.float64(0.15339567212611124),
 'ps_reg_01': np.float64(0.6111741806032911),
 'ps_reg_02': np.float64(0.43951201013805413),
 'ps_reg_03'

#### Substituiremos os nulos na nossa amostra de treino

In [None]:
# substituindo os nulos na amostra de treino
test_prod = pod.pod_custom_fillna_prod(test,loaded_means)
test_prod.shape

(178564, 59)

In [None]:
test_prod.head()

Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
256886,642026,0,4,1.0,5,1.0,0.0,1,0,0,...,7,3,2,3,0,1,0,1,0,0
118785,297043,0,6,2.0,10,1.0,0.0,0,0,0,...,6,3,3,5,0,1,1,0,0,0
56083,140591,0,4,1.0,9,1.0,0.0,0,0,1,...,3,1,0,7,0,0,1,0,0,0
542002,1354540,0,0,1.0,7,1.0,4.0,0,1,0,...,1,1,3,6,1,1,0,0,0,0
349518,873173,0,1,1.0,3,1.0,0.0,1,0,0,...,6,1,5,6,0,1,0,0,0,0


Como não temos colunas categóricas, então não precisamos utilizar nenhum tratamento para elas.

Porém caso existissem, poderíamos utilizar a **OneHotEncoder** para variáveis com baixa cardinalidade e **LabelEncoder** para variáveis com alta cardinalidade.

#### Aplicando a normalização na tabela

In [None]:
from sklearn.preprocessing import StandardScaler

# excluindo IDs e Targets
df_id_target = metadata_df[(metadata_df['USO_FEATURE'] == 'ID') | (metadata_df['USO_FEATURE'] == 'Target')]
lista_id_target = list(df_id_target.FEATURE.values)
print('Lista de IDs e Target: ',lista_id_target)

# instanciando o scaler
scaler = StandardScaler()

# padronizando a base de treino
df_train_03_scaled = scaler.fit_transform(df_train_03)
df_train_04 = pd.DataFrame(df_train_03_scaled, columns=df_train_03.columns, index=df_train_03.index)

# salvando o scaler em um arquivo .pkl
with open('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/prd_scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

Lista de IDs e Target:  ['target', 'id']


In [None]:
df_train_04.head()

Unnamed: 0,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,ps_ind_09_bin,ps_ind_10_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
391389,2.568989,-0.542601,1.695565,1.182765,-0.31183,-0.806035,-0.587823,2.259801,-0.477413,-0.019166,...,-0.189128,0.46425,0.665396,0.896386,-0.373426,0.770766,0.897048,-0.634586,-0.730772,2.350925
518243,1.561116,-0.542601,-0.527733,1.182765,-0.31183,-0.806035,1.701191,-0.442517,-0.477413,-0.019166,...,-1.045842,-0.367316,1.845816,1.624615,-0.373426,0.770766,0.897048,-0.634586,1.368416,-0.425364
136933,-0.958565,-0.542601,0.583916,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,-0.617485,1.295816,0.665396,0.896386,-0.373426,-1.29741,0.897048,-0.634586,1.368416,-0.425364
432345,-0.958565,-0.542601,-0.157184,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,-0.189128,1.295816,0.665396,0.168156,-0.373426,-1.29741,0.897048,-0.634586,1.368416,-0.425364
127021,-0.454629,-0.542601,-0.898283,-0.845588,-0.31183,-0.806035,-0.587823,-0.442517,2.09462,-0.019166,...,1.095943,-0.367316,0.665396,-0.924188,-0.373426,0.770766,0.897048,1.575831,1.368416,-0.425364


In [None]:
# removendo as colunas da amostra de teste
list_columns_drop = ['id','target','ps_car_03_cat']
df_test_aux = test.drop(axis=1,columns=list_columns_drop)

In [None]:
# carregando o scaler
with open('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/prd_scaler.pkl', 'rb') as f:
    loaded_scaler = pickle.load(f)

# aplicando o scaler na base de teste
test_df_scaled = loaded_scaler.transform(df_test_aux)
test_df = pd.DataFrame(test_df_scaled, columns=df_test_aux.columns, index=df_test_aux.index)
test_df.head()

Unnamed: 0,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,ps_ind_09_bin,ps_ind_10_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
256886,1.05718,-0.542601,0.213366,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,0.667586,1.295816,-0.515024,-1.652418,-0.373426,0.770766,-1.114768,1.575831,-0.730772,-0.425364
118785,2.065053,0.966428,2.066114,1.182765,-0.31183,-0.806035,-0.587823,-0.442517,2.09462,-0.019166,...,0.239229,1.295816,0.075186,-0.924188,-0.373426,0.770766,0.897048,-0.634586,-0.730772,-0.425364
56083,1.05718,-0.542601,1.695565,1.182765,-0.31183,-0.806035,-0.587823,2.259801,-0.477413,-0.019166,...,-1.045842,-0.367316,-1.695444,-0.195959,-0.373426,-1.29741,0.897048,-0.634586,-0.730772,-0.425364
542002,-0.958565,-0.542601,0.954465,1.182765,2.665092,-0.806035,1.701191,-0.442517,-0.477413,-0.019166,...,-1.902556,-0.367316,0.075186,-0.560074,2.677904,0.770766,-1.114768,-0.634586,-0.730772,-0.425364
349518,-0.454629,-0.542601,-0.527733,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,0.239229,-0.367316,1.255606,-0.560074,-0.373426,0.770766,-1.114768,-0.634586,-0.730772,-0.425364


In [None]:
# trazendo o id e target para a tabela após a dataprep

abt_train = df_train_04.merge(train[['id','target']], left_index=True, right_index=True, how='inner')
abt_test = test_df.merge(test[['id','target']], left_index=True, right_index=True, how='inner')

In [None]:
abt_train.head()

Unnamed: 0,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,ps_ind_09_bin,ps_ind_10_bin,...,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin,id,target
391389,2.568989,-0.542601,1.695565,1.182765,-0.31183,-0.806035,-0.587823,2.259801,-0.477413,-0.019166,...,0.665396,0.896386,-0.373426,0.770766,0.897048,-0.634586,-0.730772,2.350925,977816,0
518243,1.561116,-0.542601,-0.527733,1.182765,-0.31183,-0.806035,1.701191,-0.442517,-0.477413,-0.019166,...,1.845816,1.624615,-0.373426,0.770766,0.897048,-0.634586,1.368416,-0.425364,1294939,0
136933,-0.958565,-0.542601,0.583916,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,0.665396,0.896386,-0.373426,-1.29741,0.897048,-0.634586,1.368416,-0.425364,342083,0
432345,-0.958565,-0.542601,-0.157184,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,0.665396,0.168156,-0.373426,-1.29741,0.897048,-0.634586,1.368416,-0.425364,1080386,0
127021,-0.454629,-0.542601,-0.898283,-0.845588,-0.31183,-0.806035,-0.587823,-0.442517,2.09462,-0.019166,...,0.665396,-0.924188,-0.373426,0.770766,0.897048,1.575831,1.368416,-0.425364,317567,1


In [None]:
abt_test.head()

Unnamed: 0,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,ps_ind_09_bin,ps_ind_10_bin,...,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin,id,target
256886,1.05718,-0.542601,0.213366,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,-0.515024,-1.652418,-0.373426,0.770766,-1.114768,1.575831,-0.730772,-0.425364,642026,0
118785,2.065053,0.966428,2.066114,1.182765,-0.31183,-0.806035,-0.587823,-0.442517,2.09462,-0.019166,...,0.075186,-0.924188,-0.373426,0.770766,0.897048,-0.634586,-0.730772,-0.425364,297043,0
56083,1.05718,-0.542601,1.695565,1.182765,-0.31183,-0.806035,-0.587823,2.259801,-0.477413,-0.019166,...,-1.695444,-0.195959,-0.373426,-1.29741,0.897048,-0.634586,-0.730772,-0.425364,140591,0
542002,-0.958565,-0.542601,0.954465,1.182765,2.665092,-0.806035,1.701191,-0.442517,-0.477413,-0.019166,...,0.075186,-0.560074,2.677904,0.770766,-1.114768,-0.634586,-0.730772,-0.425364,1354540,0
349518,-0.454629,-0.542601,-0.527733,1.182765,-0.31183,1.240641,-0.587823,-0.442517,-0.477413,-0.019166,...,1.255606,-0.560074,-0.373426,0.770766,-1.114768,-0.634586,-0.730772,-0.425364,873173,0


#### Salvando as tabelas abt criadas

In [None]:
abt_train.to_csv('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/abt_train.csv')
abt_test.to_csv('/content/drive/MyDrive/1 - Aulas PoD Academy/Cientista de Dados POD/Aulas/Case Porto Seguro/abt_test.csv')