# **APRENDIZAGEM SUPERVISIONADA: CLASSIFICAÇÃO DO DATASET CÂNCER DE MAMA DO REPOSITORIO KAGGLE**

Este projeto tem como objetivo desenvolver um modelo de Machine Learning para prever a probabilidade de um tumor mamário ser benigno ou maligno. A análise será conduzida com base em características extraídas de imagens digitalizadas de biópsias por aspiração com agulha fina (FNA) de massas mamárias. Utilizando o conjunto de dados do Breast Cancer Wisconsin (Diagnostic), empregaremos diversas técnicas de classificação para examinar variáveis clínicas e laboratoriais — incluindo raio, textura, perímetro, área, suavidade, compactação, concavidade e pontos concavos das células nucleares — com o intuito de construir um modelo preditivo robusto.

Para determinar a técnica mais eficaz, realizaremos uma comparação sistemática dos seguintes métodos, utilizando suas configurações padrão, para identificar aquele que oferece a melhor performance:

    Naive Bayes
    Máquinas de Vetores de Suporte (SVM)
    Regressão Logística
    Aprendizagem Baseada em Instâncias (KNN)
    Árvore de Decisão
    Random Forest
    XGBoost
    LightGBM
    CatBoost

Cada método será avaliado com base na sua precisão, recall, F1-score, e outras métricas relevantes, permitindo-nos selecionar o algoritmo mais adequado para nosso modelo preditivo.

Os dados foram extraídos do site do Kaggle:

https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data

# **EXPLORAÇÃO, ANÁLISE E TRATAMENTO DOS DADOS: PROJETO PREVISÃO DE CÂNCER DE MAMA**

## **Exploração dos Dados**

In [1]:
import numpy as np
import pandas as pd

In [2]:
dados = pd.read_csv('../database/data_cancer2.csv',
                    sep=',', encoding='utf-8')

In [3]:
dados.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [4]:
dados.tail()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
564,926424,M,21.56,22.39,142.0,1479.0,0.111,0.1159,0.2439,0.1389,...,26.4,166.1,2027.0,0.141,0.2113,0.4107,0.2216,0.206,0.07115,
565,926682,M,20.13,28.25,131.2,1261.0,0.0978,0.1034,0.144,0.09791,...,38.25,155.0,1731.0,0.1166,0.1922,0.3215,0.1628,0.2572,0.06637,
566,926954,M,16.6,28.08,108.3,858.1,0.08455,0.1023,0.09251,0.05302,...,34.12,126.7,1124.0,0.1139,0.3094,0.3403,0.1418,0.2218,0.0782,
567,927241,M,20.6,29.33,140.1,1265.0,0.1178,0.277,0.3514,0.152,...,39.42,184.6,1821.0,0.165,0.8681,0.9387,0.265,0.4087,0.124,
568,92751,B,7.76,24.54,47.92,181.0,0.05263,0.04362,0.0,0.0,...,30.37,59.16,268.6,0.08996,0.06444,0.0,0.0,0.2871,0.07039,


In [5]:
dados.shape

(569, 33)

## **Análise das Variáveis (Atributos)**

## **Análise dos tipos de atributos.**

In [6]:
dados.dtypes

id                           int64
diagnosis                   object
radius_mean                float64
texture_mean               float64
perimeter_mean             float64
area_mean                  float64
smoothness_mean            float64
compactness_mean           float64
concavity_mean             float64
concave points_mean        float64
symmetry_mean              float64
fractal_dimension_mean     float64
radius_se                  float64
texture_se                 float64
perimeter_se               float64
area_se                    float64
smoothness_se              float64
compactness_se             float64
concavity_se               float64
concave points_se          float64
symmetry_se                float64
fractal_dimension_se       float64
radius_worst               float64
texture_worst              float64
perimeter_worst            float64
area_worst                 float64
smoothness_worst           float64
compactness_worst          float64
concavity_worst     

Após a análise inicial do DataFrame, identificamos duas colunas dispensáveis: a coluna ‘id’, que é redundante para nossa análise, e a coluna ‘Unnamed: 32’, que parece ser um artefato sem relevância gerado durante a importação dos dados. Ambas serão removidas para otimizar nosso conjunto de dados.

In [7]:
dados_relevante = pd.DataFrame.copy(dados)

In [8]:
dados_relevante.drop(dados.columns[0], axis=1, inplace=True)

In [9]:
dados_relevante.drop(dados.columns[-1], axis=1, inplace=True)

## **Valores Missing (NAN)**

In [10]:
dados_relevante.isnull().sum()

diagnosis                  0
radius_mean                0
texture_mean               0
perimeter_mean             0
area_mean                  0
smoothness_mean            0
compactness_mean           0
concavity_mean             0
concave points_mean        0
symmetry_mean              0
fractal_dimension_mean     0
radius_se                  0
texture_se                 0
perimeter_se               0
area_se                    0
smoothness_se              0
compactness_se             0
concavity_se               0
concave points_se          0
symmetry_se                0
fractal_dimension_se       0
radius_worst               0
texture_worst              0
perimeter_worst            0
area_worst                 0
smoothness_worst           0
compactness_worst          0
concavity_worst            0
concave points_worst       0
symmetry_worst             0
fractal_dimension_worst    0
dtype: int64

## **Análises Estatísticas Descritivas**

In [11]:
dados_relevante.describe()

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,16.26919,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,4.833242,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,7.93,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,13.01,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,14.97,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,18.79,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,36.04,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075


In [12]:
dados_relevante.mode()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,B,12.34,14.93,82.61,512.2,0.1007,0.1147,0.0,0.0,0.1601,...,12.36,17.7,101.7,284.4,0.1216,0.1486,0.0,0.0,0.2226,0.07427
1,,,15.7,87.76,,,0.1206,,,0.1714,...,,27.26,105.9,402.8,0.1223,0.3416,,,0.2369,
2,,,16.84,134.7,,,,,,0.1717,...,,,117.7,439.6,0.1234,,,,0.2383,
3,,,16.85,,,,,,,0.1769,...,,,,458.0,0.1256,,,,0.2972,
4,,,17.46,,,,,,,0.1893,...,,,,472.4,0.1275,,,,0.3109,
5,,,18.22,,,,,,,,...,,,,489.5,0.1312,,,,0.3196,
6,,,18.9,,,,,,,,...,,,,546.7,0.1347,,,,,
7,,,19.83,,,,,,,,...,,,,547.4,0.1401,,,,,
8,,,20.52,,,,,,,,...,,,,624.1,0.1415,,,,,
9,,,,,,,,,,,...,,,,698.8,,,,,,


## **Salvando (Exportando) o Dataframe Tratado**

In [13]:
dados_relevante.to_csv('../database/data_cancer2_tratado.csv', sep=',', encoding='utf-8', index = False)

# **PRÉ-PROCESSAMENTO**

In [14]:
import numpy as np
import pandas as pd

In [15]:
df_original = pd.read_csv('../database/data_cancer2_tratado.csv',
                    sep=',', encoding='utf-8')

In [16]:
df_original.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [17]:
df_original.shape

(569, 31)

In [18]:
df_original.dtypes

diagnosis                   object
radius_mean                float64
texture_mean               float64
perimeter_mean             float64
area_mean                  float64
smoothness_mean            float64
compactness_mean           float64
concavity_mean             float64
concave points_mean        float64
symmetry_mean              float64
fractal_dimension_mean     float64
radius_se                  float64
texture_se                 float64
perimeter_se               float64
area_se                    float64
smoothness_se              float64
compactness_se             float64
concavity_se               float64
concave points_se          float64
symmetry_se                float64
fractal_dimension_se       float64
radius_worst               float64
texture_worst              float64
perimeter_worst            float64
area_worst                 float64
smoothness_worst           float64
compactness_worst          float64
concavity_worst            float64
concave points_worst

## **Transformando as variáveis categóricas nominais em variáveis categóricas ordinais**

In [19]:
df_ordinal = pd.DataFrame.copy(df_original)

In [20]:
df_ordinal['diagnosis'].replace({'M':1, 'B': 0}, inplace=True)

In [21]:
df_ordinal.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,1,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,1,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,1,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,1,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [22]:
df_ordinal.dtypes

diagnosis                    int64
radius_mean                float64
texture_mean               float64
perimeter_mean             float64
area_mean                  float64
smoothness_mean            float64
compactness_mean           float64
concavity_mean             float64
concave points_mean        float64
symmetry_mean              float64
fractal_dimension_mean     float64
radius_se                  float64
texture_se                 float64
perimeter_se               float64
area_se                    float64
smoothness_se              float64
compactness_se             float64
concavity_se               float64
concave points_se          float64
symmetry_se                float64
fractal_dimension_se       float64
radius_worst               float64
texture_worst              float64
perimeter_worst            float64
area_worst                 float64
smoothness_worst           float64
compactness_worst          float64
concavity_worst            float64
concave points_worst

In [23]:
df_ordinal.shape

(569, 31)

## **LEGENDA**

Atributos:

- **id**: Um número inteiro que serve como identificador único para cada amostra.
- **diagnosis**: Uma string que representa o diagnóstico, onde 1 (‘M’) indica maligno e 0 (‘B’) indica benigno.
- **radius_mean**: Um número real que representa a média dos raios dos núcleos celulares.
- **texture_mean**: Um número real que representa a média do desvio padrão dos valores de escala de cinza.
- **perimeter_mean**: Um número real que representa a média dos perímetros dos núcleos celulares.
- **area_mean**: Um número real que representa a média das áreas dos núcleos celulares.
- **smoothness_mean**: Um número real que representa a média da variação local nos comprimentos dos raios dos núcleos celulares.
- **compactness_mean**: Um número real que representa a média da compacidade dos núcleos celulares, calculada como $$\text{perímetro}^2/\text{área} - 1.0$$.
- **concavity_mean**: Um número real que representa a média da gravidade das porções côncavas do contorno dos núcleos celulares.
- **concave points_mean**: Um número real que representa a média do número de porções côncavas do contorno dos núcleos celulares.
- **symmetry_mean**: Um número real que representa a média da simetria dos núcleos celulares.
- **fractal_dimension_mean**: Um número real que representa a média da “aproximação da linha costeira - 1” dos núcleos celulares.
- **radius_se**: Um número real que representa o erro padrão dos raios dos núcleos celulares.
- **texture_se**: Um número real que representa o erro padrão do desvio padrão dos valores de escala de cinza.
- **perimeter_se**: Um número real que representa o erro padrão dos perímetros dos núcleos celulares.
- **area_se**: Um número real que representa o erro padrão das áreas dos núcleos celulares.
- **smoothness_se**: Um número real que representa o erro padrão da variação local nos comprimentos dos raios dos núcleos celulares.
- **compactness_se**: Um número real que representa o erro padrão da compacidade dos núcleos celulares.
- **concavity_se**: Um número real que representa o erro padrão da gravidade das porções côncavas do contorno dos núcleos celulares.
- **concave points_se**: Um número real que representa o erro padrão do número de porções côncavas do contorno dos núcleos celulares.
- **symmetry_se**: Um número real que representa o erro padrão da simetria dos núcleos celulares.
- **fractal_dimension_se**: Um número real que representa o erro padrão da “aproximação da linha costeira - 1” dos núcleos celulares.
- **radius_worst**: Um número real que representa o maior valor médio dos raios dos núcleos celulares.
- **texture_worst**: Um número real que representa o maior valor médio do desvio padrão dos valores de escala de cinza.
- **perimeter_worst**: Um número real que representa o maior valor médio dos perímetros dos núcleos celulares.
- **area_worst**: Um número real que representa o maior valor médio das áreas dos núcleos celulares.
- **smoothness_worst**: Um número real que representa o maior valor médio da variação local nos comprimentos dos raios dos núcleos celulares.
- **compactness_worst**: Um número real que representa o maior valor médio da compacidade dos núcleos celulares.
- **concavity_worst**: Um número real que representa o maior valor médio da gravidade das porções côncavas do contorno dos núcleos celulares.
- **concave points_worst**: Um número real que representa o maior valor médio do número de porções côncavas do contorno dos núcleos celulares.
- **symmetry_worst**: Um número real que representa o maior valor médio da simetria dos núcleos celulares.
- **fractal_dimension_worst**: Um número real que representa o maior valor médio da “aproximação da linha costeira - 1” dos núcleos celulares.

## **ATRIBUTOS PREVISORES E ALVO**

In [24]:
df_ordinal.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,1,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,1,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,1,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,1,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [25]:
previsores = df_ordinal.iloc[:, 1:32].values


In [26]:
previsores

array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]])

In [27]:
previsores.shape

(569, 30)

In [28]:
alvo = df_ordinal.iloc[:, 0].values

In [29]:
alvo

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1,
       0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,
       0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,

In [30]:
alvo.shape

(569,)

## **Análise das escalas dos atributos (Escalonamento)**

In [31]:
df_ordinal.describe()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,0.372583,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,...,16.26919,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946
std,0.483918,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,...,4.833242,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061
min,0.0,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,...,7.93,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504
25%,0.0,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,...,13.01,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146
50%,0.0,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,...,14.97,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004
75%,1.0,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,...,18.79,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208
max,1.0,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,...,36.04,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075


Padronização (utiliza a média e o desvio padrão como referência).

Normalização (utiliza os valores máximo e mínimo como referência).

In [32]:
from sklearn.preprocessing import StandardScaler

In [33]:
previsores_esc = StandardScaler().fit_transform(previsores)

In [34]:
previsores_esc

array([[ 1.09706398, -2.07333501,  1.26993369, ...,  2.29607613,
         2.75062224,  1.93701461],
       [ 1.82982061, -0.35363241,  1.68595471, ...,  1.0870843 ,
        -0.24388967,  0.28118999],
       [ 1.57988811,  0.45618695,  1.56650313, ...,  1.95500035,
         1.152255  ,  0.20139121],
       ...,
       [ 0.70228425,  2.0455738 ,  0.67267578, ...,  0.41406869,
        -1.10454895, -0.31840916],
       [ 1.83834103,  2.33645719,  1.98252415, ...,  2.28998549,
         1.91908301,  2.21963528],
       [-1.80840125,  1.22179204, -1.81438851, ..., -1.74506282,
        -0.04813821, -0.75120669]])

In [35]:
previsores_esc_df = pd.DataFrame(previsores_esc)
previsores_esc_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,20,21,22,23,24,25,26,27,28,29
0,1.097064,-2.073335,1.269934,0.984375,1.568466,3.283515,2.652874,2.532475,2.217515,2.255747,...,1.886690,-1.359293,2.303601,2.001237,1.307686,2.616665,2.109526,2.296076,2.750622,1.937015
1,1.829821,-0.353632,1.685955,1.908708,-0.826962,-0.487072,-0.023846,0.548144,0.001392,-0.868652,...,1.805927,-0.369203,1.535126,1.890489,-0.375612,-0.430444,-0.146749,1.087084,-0.243890,0.281190
2,1.579888,0.456187,1.566503,1.558884,0.942210,1.052926,1.363478,2.037231,0.939685,-0.398008,...,1.511870,-0.023974,1.347475,1.456285,0.527407,1.082932,0.854974,1.955000,1.152255,0.201391
3,-0.768909,0.253732,-0.592687,-0.764464,3.283553,3.402909,1.915897,1.451707,2.867383,4.910919,...,-0.281464,0.133984,-0.249939,-0.550021,3.394275,3.893397,1.989588,2.175786,6.046041,4.935010
4,1.750297,-1.151816,1.776573,1.826229,0.280372,0.539340,1.371011,1.428493,-0.009560,-0.562450,...,1.298575,-1.466770,1.338539,1.220724,0.220556,-0.313395,0.613179,0.729259,-0.868353,-0.397100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,2.110995,0.721473,2.060786,2.343856,1.041842,0.219060,1.947285,2.320965,-0.312589,-0.931027,...,1.901185,0.117700,1.752563,2.015301,0.378365,-0.273318,0.664512,1.629151,-1.360158,-0.709091
565,1.704854,2.085134,1.615931,1.723842,0.102458,-0.017833,0.693043,1.263669,-0.217664,-1.058611,...,1.536720,2.047399,1.421940,1.494959,-0.691230,-0.394820,0.236573,0.733827,-0.531855,-0.973978
566,0.702284,2.045574,0.672676,0.577953,-0.840484,-0.038680,0.046588,0.105777,-0.809117,-0.895587,...,0.561361,1.374854,0.579001,0.427906,-0.809587,0.350735,0.326767,0.414069,-1.104549,-0.318409
567,1.838341,2.336457,1.982524,1.735218,1.525767,3.272144,3.296944,2.658866,2.137194,1.043695,...,1.961239,2.237926,2.303601,1.653171,1.430427,3.904848,3.197605,2.289985,1.919083,2.219635


In [36]:
previsores_esc_df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,20,21,22,23,24,25,26,27,28,29
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,-1.373633e-16,6.868164e-17,-1.248757e-16,-2.185325e-16,-8.366672e-16,1.873136e-16,4.995028e-17,-4.995028e-17,1.74826e-16,4.745277e-16,...,-8.241796e-16,1.248757e-17,-3.746271e-16,0.0,-2.372638e-16,-3.371644e-16,7.492542e-17,2.247763e-16,2.62239e-16,-5.744282e-16
std,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,...,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088
min,-2.029648,-2.229249,-1.984504,-1.454443,-3.112085,-1.610136,-1.114873,-1.26182,-2.744117,-1.819865,...,-1.726901,-2.223994,-1.693361,-1.222423,-2.682695,-1.443878,-1.305831,-1.745063,-2.16096,-1.601839
25%,-0.6893853,-0.7259631,-0.6919555,-0.6671955,-0.7109628,-0.747086,-0.7437479,-0.7379438,-0.7032397,-0.7226392,...,-0.6749213,-0.7486293,-0.6895783,-0.642136,-0.6912304,-0.6810833,-0.7565142,-0.7563999,-0.6418637,-0.6919118
50%,-0.2150816,-0.1046362,-0.23598,-0.2951869,-0.03489108,-0.2219405,-0.3422399,-0.3977212,-0.0716265,-0.1782793,...,-0.2690395,-0.04351564,-0.2859802,-0.341181,-0.04684277,-0.2695009,-0.2182321,-0.2234689,-0.1274095,-0.2164441
75%,0.4693926,0.5841756,0.4996769,0.3635073,0.636199,0.4938569,0.5260619,0.6469351,0.5307792,0.4709834,...,0.5220158,0.6583411,0.540279,0.357589,0.5975448,0.5396688,0.5311411,0.71251,0.4501382,0.4507624
max,3.971288,4.651889,3.97613,5.250529,4.770911,4.568425,4.243589,3.92793,4.484751,4.910919,...,4.094189,3.885905,4.287337,5.930172,3.955374,5.112877,4.700669,2.685877,6.046041,6.846856


## **Codificação de variáveis categóricas**

### **LabelEncoder: transformação de variáveis categóricas em numéricas**


In [37]:
from sklearn.preprocessing import LabelEncoder

In [38]:
df_original.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [39]:
previsores_labelencoder = df_original.iloc[:, 1:32].values
previsores_labelencoder

array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]])

In [40]:
previsores_labelencoder.shape

(569, 30)

In [41]:
alvo_labelencoder = df_original.iloc[:, 0].values
alvo_labelencoder

array(['M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M',
       'M', 'M', 'M', 'M', 'M', 'M', 'B', 'B', 'B', 'M', 'M', 'M', 'M',
       'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'B', 'M',
       'M', 'M', 'M', 'M', 'M', 'M', 'M', 'B', 'M', 'B', 'B', 'B', 'B',
       'B', 'M', 'M', 'B', 'M', 'M', 'B', 'B', 'B', 'B', 'M', 'B', 'M',
       'M', 'B', 'B', 'B', 'B', 'M', 'B', 'M', 'M', 'B', 'M', 'B', 'M',
       'M', 'B', 'B', 'B', 'M', 'M', 'B', 'M', 'M', 'M', 'B', 'B', 'B',
       'M', 'B', 'B', 'M', 'M', 'B', 'B', 'B', 'M', 'M', 'B', 'B', 'B',
       'B', 'M', 'B', 'B', 'M', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
       'M', 'M', 'M', 'B', 'M', 'M', 'B', 'B', 'B', 'M', 'M', 'B', 'M',
       'B', 'M', 'M', 'B', 'M', 'M', 'B', 'B', 'M', 'B', 'B', 'M', 'B',
       'B', 'B', 'B', 'M', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
       'M', 'B', 'B', 'B', 'B', 'M', 'M', 'B', 'M', 'B', 'B', 'M', 'M',
       'B', 'B', 'M', 'M', 'B', 'B', 'B', 'B', 'M', 'B', 'B', 'M

In [42]:
alvo_labelencoder_ordinal = LabelEncoder().fit_transform(alvo_labelencoder)
alvo_labelencoder_ordinal

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1,
       0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,
       0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,

In [43]:
alvo_labelencoder_ordinal.shape

(569,)

## **Escalonamento**

In [44]:
from sklearn.preprocessing import StandardScaler

In [45]:
previsores_labelencoder_esc = StandardScaler().fit_transform(previsores_labelencoder)

In [46]:
previsores_labelencoder_esc

array([[ 1.09706398, -2.07333501,  1.26993369, ...,  2.29607613,
         2.75062224,  1.93701461],
       [ 1.82982061, -0.35363241,  1.68595471, ...,  1.0870843 ,
        -0.24388967,  0.28118999],
       [ 1.57988811,  0.45618695,  1.56650313, ...,  1.95500035,
         1.152255  ,  0.20139121],
       ...,
       [ 0.70228425,  2.0455738 ,  0.67267578, ...,  0.41406869,
        -1.10454895, -0.31840916],
       [ 1.83834103,  2.33645719,  1.98252415, ...,  2.28998549,
         1.91908301,  2.21963528],
       [-1.80840125,  1.22179204, -1.81438851, ..., -1.74506282,
        -0.04813821, -0.75120669]])

In [47]:
previsores_labelencoder_esc_df = pd.DataFrame(previsores_labelencoder_esc)
previsores_labelencoder_esc_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,20,21,22,23,24,25,26,27,28,29
0,1.097064,-2.073335,1.269934,0.984375,1.568466,3.283515,2.652874,2.532475,2.217515,2.255747,...,1.886690,-1.359293,2.303601,2.001237,1.307686,2.616665,2.109526,2.296076,2.750622,1.937015
1,1.829821,-0.353632,1.685955,1.908708,-0.826962,-0.487072,-0.023846,0.548144,0.001392,-0.868652,...,1.805927,-0.369203,1.535126,1.890489,-0.375612,-0.430444,-0.146749,1.087084,-0.243890,0.281190
2,1.579888,0.456187,1.566503,1.558884,0.942210,1.052926,1.363478,2.037231,0.939685,-0.398008,...,1.511870,-0.023974,1.347475,1.456285,0.527407,1.082932,0.854974,1.955000,1.152255,0.201391
3,-0.768909,0.253732,-0.592687,-0.764464,3.283553,3.402909,1.915897,1.451707,2.867383,4.910919,...,-0.281464,0.133984,-0.249939,-0.550021,3.394275,3.893397,1.989588,2.175786,6.046041,4.935010
4,1.750297,-1.151816,1.776573,1.826229,0.280372,0.539340,1.371011,1.428493,-0.009560,-0.562450,...,1.298575,-1.466770,1.338539,1.220724,0.220556,-0.313395,0.613179,0.729259,-0.868353,-0.397100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,2.110995,0.721473,2.060786,2.343856,1.041842,0.219060,1.947285,2.320965,-0.312589,-0.931027,...,1.901185,0.117700,1.752563,2.015301,0.378365,-0.273318,0.664512,1.629151,-1.360158,-0.709091
565,1.704854,2.085134,1.615931,1.723842,0.102458,-0.017833,0.693043,1.263669,-0.217664,-1.058611,...,1.536720,2.047399,1.421940,1.494959,-0.691230,-0.394820,0.236573,0.733827,-0.531855,-0.973978
566,0.702284,2.045574,0.672676,0.577953,-0.840484,-0.038680,0.046588,0.105777,-0.809117,-0.895587,...,0.561361,1.374854,0.579001,0.427906,-0.809587,0.350735,0.326767,0.414069,-1.104549,-0.318409
567,1.838341,2.336457,1.982524,1.735218,1.525767,3.272144,3.296944,2.658866,2.137194,1.043695,...,1.961239,2.237926,2.303601,1.653171,1.430427,3.904848,3.197605,2.289985,1.919083,2.219635


In [48]:
previsores_labelencoder_esc_df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,20,21,22,23,24,25,26,27,28,29
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,-1.373633e-16,6.868164e-17,-1.248757e-16,-2.185325e-16,-8.366672e-16,1.873136e-16,4.995028e-17,-4.995028e-17,1.74826e-16,4.745277e-16,...,-8.241796e-16,1.248757e-17,-3.746271e-16,0.0,-2.372638e-16,-3.371644e-16,7.492542e-17,2.247763e-16,2.62239e-16,-5.744282e-16
std,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,...,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088,1.00088
min,-2.029648,-2.229249,-1.984504,-1.454443,-3.112085,-1.610136,-1.114873,-1.26182,-2.744117,-1.819865,...,-1.726901,-2.223994,-1.693361,-1.222423,-2.682695,-1.443878,-1.305831,-1.745063,-2.16096,-1.601839
25%,-0.6893853,-0.7259631,-0.6919555,-0.6671955,-0.7109628,-0.747086,-0.7437479,-0.7379438,-0.7032397,-0.7226392,...,-0.6749213,-0.7486293,-0.6895783,-0.642136,-0.6912304,-0.6810833,-0.7565142,-0.7563999,-0.6418637,-0.6919118
50%,-0.2150816,-0.1046362,-0.23598,-0.2951869,-0.03489108,-0.2219405,-0.3422399,-0.3977212,-0.0716265,-0.1782793,...,-0.2690395,-0.04351564,-0.2859802,-0.341181,-0.04684277,-0.2695009,-0.2182321,-0.2234689,-0.1274095,-0.2164441
75%,0.4693926,0.5841756,0.4996769,0.3635073,0.636199,0.4938569,0.5260619,0.6469351,0.5307792,0.4709834,...,0.5220158,0.6583411,0.540279,0.357589,0.5975448,0.5396688,0.5311411,0.71251,0.4501382,0.4507624
max,3.971288,4.651889,3.97613,5.250529,4.770911,4.568425,4.243589,3.92793,4.484751,4.910919,...,4.094189,3.885905,4.287337,5.930172,3.955374,5.112877,4.700669,2.685877,6.046041,6.846856


## **RESUMO PRÉ-PROCESSAMENTO**

- **Variáveis do Conjunto de Dados:**

    - **Alvo (Target):** Variável dependente que indica a presença ou ausência de doença cardíaca, com 'M' para maligno e 'B' para benigno.
    - **Alvos com LabelEncoder:** Utiliza LabelEncoder para converter variáveis categóricas em numéricas sem alterar a escala.
    - **Alvos com OneHotEncoder:** Aplica LabelEncoder e OneHotEncoder para criar representações binárias das categorias sem introduzir hierarquia.
    - **Previsores:** Variáveis preditoras transformadas de categóricas para numéricas manualmente, sem normalização ou escalonamento.
    - **Previsores com LabelEncoder:** Converte variáveis categóricas em numéricas usando LabelEncoder sem alterar a escala.
    - **Previsores Escalonados:** Variáveis preditoras categóricas são convertidas para numéricas e escalonadas para um intervalo uniforme.
    - **Previsores com LabelEncoder (Scaled Predictors LabelEncoder):** Usa LabelEncoder para converter variáveis categóricas em numéricas e escalonar os dados.

- **Características dos Núcleos Celulares:**
    - **Raio:** Média das distâncias do centro aos pontos do perímetro.
    - **Textura:** Desvio padrão dos valores da escala de cinza.
    - **Perímetro, Área, Suavidade:** Medem a forma e a textura da célula.
    - **Compacidade:** Calculada como ...


## **BASE DE TREINO E TESTE**

In [49]:
from sklearn.model_selection import train_test_split

Parâmetros train_test_split:   
- arrays: nomes dos atributos previsores e alvo.   
- test_size: tamanho em porcentagem dos dados de teste. default é none.   
- train_size: tamanho em porcentagem dos dados de treinamento.default é none.  
- random_state: nomeação de um estado aleatório.   
- shuffle: embaralhamento dos dados aleatórios. Associado com o random_state ocorre o mesmo embaralhamento sempre. Default é True.  
- stratify: Possibilidade de dividir os dados de forma estratificada. Default é None (nesse caso é mantido a proporção, isto é, se tem 30% de zeros e 70% de 1 no dataframe, na separação em treinamento e teste se manterá essa proporção).

In [218]:
x_treino, x_teste, y_treino, y_teste = train_test_split(previsores_labelencoder_esc, alvo, test_size = 0.3, random_state = 0)

In [219]:
x_treino.shape

(398, 30)

In [220]:
x_teste.shape

(171, 30)

In [221]:
y_treino.shape

(398,)

In [222]:
y_teste.shape

(171,)

- **Alvos:**
- **1-:** alvo
- **2-:** alvo_labelencoder_ordinal
- 
- **Testes:**
- **1-** previsores
- **2-** previsores_esc
- **3-** previsores_labelencoder
- **4-** previsores_labelencoder_esc

# **NAIVE BAYES**

https://scikit-learn.org/stable/modules/naive_bayes.html

Treinamento do algoritmo

In [55]:
from sklearn.naive_bayes import GaussianNB

In [56]:
naive = GaussianNB()
naive.fit(x_treino, y_treino)

Avaliação do algoritmo

In [57]:
previsoes_naive = naive.predict(x_teste)
previsoes_naive

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,
       1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [58]:
y_teste

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [59]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [60]:
accuracy_score(y_teste, previsoes_naive)

0.9239766081871345

In [61]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_naive) * 100.0))

Acurácia: 92.40%


In [62]:
confusion_matrix(y_teste, previsoes_naive)

array([[101,   7],
       [  6,  57]])

In [63]:
print(classification_report(y_teste, previsoes_naive))

              precision    recall  f1-score   support

           0       0.94      0.94      0.94       108
           1       0.89      0.90      0.90        63

    accuracy                           0.92       171
   macro avg       0.92      0.92      0.92       171
weighted avg       0.92      0.92      0.92       171



**Análise dados de treino**

In [64]:
previsoes_treino = naive.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [65]:
accuracy_score(y_treino, previsoes_treino)

0.9422110552763819

In [66]:
confusion_matrix(y_treino, previsoes_treino)

array([[242,   7],
       [ 16, 133]])

### **Validação Cruzada**

In [67]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [68]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [69]:
# Criando o modelo
modelo = GaussianNB()
resultado = cross_val_score(modelo, previsores, alvo, cv = kfold)
resultado

array([0.89473684, 0.89473684, 1.        , 0.94736842, 1.        ,
       0.94736842, 0.94736842, 1.        , 0.94736842, 1.        ,
       1.        , 0.94736842, 0.94736842, 0.94736842, 0.94736842,
       1.        , 0.84210526, 0.89473684, 1.        , 0.94736842,
       0.89473684, 0.78947368, 0.89473684, 1.        , 0.94736842,
       1.        , 0.94736842, 0.94736842, 0.89473684, 0.77777778])

In [70]:
# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 93.82%


Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores, alvo
Naive Bayes = 91.23% (treino e teste) - e 93.47% (validação cruzada) - previsores_esc, alvo
Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores_labelencoder, alvo
Naive Bayes = 91.23% (treino e teste) - e 93.47% (validação cruzada) - previsores_labelencoder_esc, alvo

Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores, alvo_labelencoder_ordinal
Naive Bayes = 91.23% (treino e teste) - e 93.47% (validação cruzada) - previsores_esc, alvo_labelencoder_ordinal
Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores_labelencoder, alvo_labelencoder_ordinal
Naive Bayes = 91.23% (treino e teste) - e 93.47% (validação cruzada) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores Naive Bayes: 
Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores, alvo
Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores_labelencoder, alvo
Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores, alvo_labelencoder_ordinal
Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores_labelencoder, alvo_labelencoder_ordinal**

# **MÁQUINAS DE VETORES DE SUPORTE (SVM)**

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

In [71]:
from sklearn.svm import SVC

In [72]:
svm = SVC(kernel='rbf', random_state=1, C = 2)
svm.fit(x_treino, y_treino)

In [73]:
previsoes_svm = svm.predict(x_teste)
previsoes_svm

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [74]:
y_teste

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [75]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [76]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_svm) * 100.0))

Acurácia: 94.74%


In [77]:
confusion_matrix(y_teste, previsoes_svm)

array([[107,   1],
       [  8,  55]])

In [78]:
print(classification_report(y_teste, previsoes_svm))

              precision    recall  f1-score   support

           0       0.93      0.99      0.96       108
           1       0.98      0.87      0.92        63

    accuracy                           0.95       171
   macro avg       0.96      0.93      0.94       171
weighted avg       0.95      0.95      0.95       171



**Análise dados de treino**

In [79]:
previsoes_treino = svm.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [80]:
accuracy_score(y_treino, previsoes_treino)

0.9095477386934674

In [81]:
confusion_matrix(y_treino, previsoes_treino)

array([[247,   2],
       [ 34, 115]])

### **Validação Cruzada**

In [82]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [83]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [84]:
# Criando o modelo
modelo = SVC(kernel='rbf', random_state=1, C = 2)
resultado = cross_val_score(modelo, previsores_labelencoder_esc, alvo_labelencoder_ordinal, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 97.88%


SVM = 94.74% (treino e teste) - e 91.72% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores, alvo
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_esc, alvo
SVM = 94.74% (treino e teste) - e 91.72% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder, alvo
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder_esc, alvo

SVM = 94.74% (treino e teste) - e 91.72% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores, alvo_labelencoder_ordinal
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_esc, alvo_labelencoder_ordinal
SVM = 94.74% (treino e teste) - e 91.72% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder, alvo_labelencoder_ordinal
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores SVM:
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_esc, alvo
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder_esc, alvo
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_esc, alvo_labelencoder_ordinal
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder_esc, alvo_labelencoder_ordinal**

# **REGRESSÃO LOGÍSTICA**

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

In [85]:
from sklearn.linear_model import LogisticRegression

In [86]:
logistica = LogisticRegression(random_state=1, max_iter=2000, penalty="l2",
                               tol=0.0001, C=1,solver="lbfgs")
logistica.fit(x_treino, y_treino)

In [87]:
logistica.intercept_

array([-29.29413608])

In [88]:
logistica.coef_

array([[-0.71944404, -0.16726875,  0.2239289 , -0.02495814,  0.14341597,
         0.17549229,  0.37664683,  0.21526684,  0.27969381,  0.02754034,
         0.02672618, -1.01538566,  0.07911164,  0.10117279,  0.01267559,
        -0.07218306,  0.01392553,  0.02315355,  0.02521767, -0.01641543,
        -0.3708321 ,  0.38068624,  0.20138334,  0.01127253,  0.2579224 ,
         0.55564292,  1.17707945,  0.46626451,  0.59258216,  0.08147444]])

In [89]:
previsoes_logistica = logistica.predict(x_teste)
previsoes_logistica

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [90]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [91]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_logistica) * 100.0))

Acurácia: 95.91%


In [92]:
confusion_matrix(y_teste, previsoes_logistica)

array([[102,   6],
       [  1,  62]])

In [93]:
print(classification_report(y_teste, previsoes_logistica))

              precision    recall  f1-score   support

           0       0.99      0.94      0.97       108
           1       0.91      0.98      0.95        63

    accuracy                           0.96       171
   macro avg       0.95      0.96      0.96       171
weighted avg       0.96      0.96      0.96       171



**Análise dados de treino**

In [94]:
previsoes_treino = logistica.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [95]:
accuracy_score(y_treino, previsoes_treino)

0.9597989949748744

In [96]:
confusion_matrix(y_treino, previsoes_treino)

array([[243,   6],
       [ 10, 139]])

### **Validação Cruzada**

In [97]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [98]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [99]:
# Criando o modelo
modelo = LogisticRegression(random_state=1, max_iter=2000, penalty="l2",
                               tol=0.0001, C=1,solver="lbfgs")
resultado = cross_val_score(modelo, previsores_labelencoder_esc, alvo_labelencoder_ordinal, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 98.06%


Regressão logística = 95.91% (treino e teste) - e 98.06% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores, alvo
Regressão logística = 97.66% (treino e teste) - e 98.06%% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_esc, alvo
SRegressão logística = 95.91% (treino e teste) - e 98.06% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder, alvo
Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder_esc, alvo

Regressão logística = 95.91% (treino e teste) - e 98.06% (validação cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores, alvo_labelencoder_ordinal
Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_esc, alvo_labelencoder_ordinal
Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder, alvo_labelencoder_ordinal
Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=600, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores Regressão Logística:
Regressão logística = 97.66% (treino e teste) - e 98.06%% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_esc, alvo
Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder_esc, alvo
Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_esc, alvo_labelencoder_ordinal
Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=600, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder_esc, alvo_labelencoder_ordinal**

# **APRENDIZAGEM BASEADA EM INSTÂNCIAS (KNN)**

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

In [100]:
from sklearn.neighbors import KNeighborsClassifier

In [101]:
knn = KNeighborsClassifier(n_neighbors=7, metric='minkowski', p=1)
knn.fit(x_treino, y_treino)

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.DistanceMetric.html

In [102]:
previsoes_knn = knn.predict(x_teste)
previsoes_knn

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [103]:
y_teste

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [104]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [105]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_knn) * 100.0))

Acurácia: 96.49%


In [106]:
confusion_matrix(y_teste, previsoes_knn)

array([[106,   2],
       [  4,  59]])

In [107]:
print(classification_report(y_teste, previsoes_knn))

              precision    recall  f1-score   support

           0       0.96      0.98      0.97       108
           1       0.97      0.94      0.95        63

    accuracy                           0.96       171
   macro avg       0.97      0.96      0.96       171
weighted avg       0.96      0.96      0.96       171



**Análise dados de treino**

In [108]:
previsoes_treino = knn.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [109]:
accuracy_score(y_treino, previsoes_treino)

0.949748743718593

In [110]:
confusion_matrix(y_treino, previsoes_treino)

array([[242,   7],
       [ 13, 136]])

### **Validação Cruzada**

In [111]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [112]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [113]:
# Criando o modelo
modelo = KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1)
resultado = cross_val_score(modelo, previsores_labelencoder, alvo_labelencoder_ordinal, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 93.13%


KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores, alvo
KNN = 95.91% (treino e teste) - e 96.65% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_esc, alvo
KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_labelencoder, alvo
KNN = 95.91% (treino e teste) - e 96.65% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) -previsores_labelencoder_esc, alvo

KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores, alvo_labelencoder_ordinal
KNN = 95.91% (treino e teste) - e 96.65% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_esc, alvo_labelencoder_ordinal
KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_labelencoder, alvo_labelencoder_ordinal
KNN = 95.91% (treino e teste) - e 96.65% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores KNN:
KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores, alvo
KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_labelencoder, alvo
KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores, alvo_labelencoder_ordinal
KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_labelencoder, alvo_labelencoder_ordinal**

# **ÁRVORE DE DECISÃO**

https://scikit-learn.org/stable/modules/tree.html

In [114]:
from sklearn.tree import DecisionTreeClassifier

In [115]:
arvore = DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3)
arvore.fit(x_treino, y_treino)

In [116]:
previsoes_arvore = arvore.predict(x_teste)
previsoes_arvore

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [117]:
y_teste

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [118]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [119]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_arvore) * 100.0))

Acurácia: 95.32%


In [120]:
confusion_matrix(y_teste, previsoes_arvore)

array([[102,   6],
       [  2,  61]])

In [121]:
print(classification_report(y_teste, previsoes_arvore))

              precision    recall  f1-score   support

           0       0.98      0.94      0.96       108
           1       0.91      0.97      0.94        63

    accuracy                           0.95       171
   macro avg       0.95      0.96      0.95       171
weighted avg       0.95      0.95      0.95       171



**Análise dados de treino**

In [122]:
previsoes_treino = arvore.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [123]:
accuracy_score(y_treino, previsoes_treino)

0.964824120603015

In [124]:
confusion_matrix(y_treino, previsoes_treino)

array([[245,   4],
       [ 10, 139]])

### **Validação Cruzada**

In [125]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [126]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [127]:
# Criando o modelo
modelo = DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=7)
resultado = cross_val_score(modelo, previsores_esc, alvo, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 92.44%


Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores, alvo
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_esc, alvo
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder, alvo
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) -previsores_labelencoder_esc, alvo

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores, alvo_labelencoder_ordinal
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_esc, alvo_labelencoder_ordinal
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder, alvo_labelencoder_ordinal
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores Árvore de decisão:
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores, alvo
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_esc, alvo
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder, alvo
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) -previsores_labelencoder_esc, alvo
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores, alvo_labelencoder_ordinal
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_esc, alvo_labelencoder_ordinal
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder, alvo_labelencoder_ordinal
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder_esc, alvo_labelencoder_ordinal**

# **RANDOM FOREST**

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

In [128]:
from sklearn.ensemble import RandomForestClassifier

In [129]:
random = RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4)
random.fit(x_treino, y_treino)

In [130]:
previsoes_random = random.predict(x_teste)
previsoes_random

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [131]:
y_teste

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [132]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [133]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_random) * 100.0))

Acurácia: 96.49%


In [134]:
confusion_matrix(y_teste, previsoes_random)

array([[106,   2],
       [  4,  59]])

In [135]:
print(classification_report(y_teste, previsoes_random))

              precision    recall  f1-score   support

           0       0.96      0.98      0.97       108
           1       0.97      0.94      0.95        63

    accuracy                           0.96       171
   macro avg       0.97      0.96      0.96       171
weighted avg       0.96      0.96      0.96       171



**Análise dados de treino**

In [136]:
previsoes_treino = random.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [137]:
accuracy_score(y_treino, previsoes_treino)

0.9899497487437185

In [138]:
confusion_matrix(y_treino, previsoes_treino)

array([[249,   0],
       [  4, 145]])

### **Validação Cruzada**

In [139]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [140]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [141]:
# Criando o modelo
modelo = RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4)
resultado = cross_val_score(modelo, previsores_labelencoder_esc, alvo, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 95.76%


Random Forest = 96.49% (treino e teste)- e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores, alvo
Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores_esc, alvo
Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores_labelencoder, alvo
Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) -previsores_labelencoder_esc, alvo

Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores, alvo_labelencoder_ordinal
Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_esc, alvo_labelencoder_ordinal
Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_labelencoder, alvo_labelencoder_ordinal
Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores Random Forest:
Random Forest = 96.49% (treino e teste)- e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores, alvo
Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores_esc, alvo
Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores_labelencoder, alvo
Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) -previsores_labelencoder_esc, alvo
Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores, alvo_labelencoder_ordinal
Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_esc, alvo_labelencoder_ordinal
Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_labelencoder, alvo_labelencoder_ordinal
Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_labelencoder_esc, alvo_labelencoder_ordinal**

# **XGBOOST**

https://xgboost.readthedocs.io/en/stable/

In [142]:
from xgboost import XGBClassifier

In [143]:
xg = XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3)
xg.fit(x_treino,y_treino)

In [144]:
previsoes_xg = xg.predict(x_teste)
previsoes_xg

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [145]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [146]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_xg) * 100.0))

Acurácia: 96.49%


In [147]:
confusion_matrix(y_teste, previsoes_xg)

array([[106,   2],
       [  4,  59]])

In [148]:
print(classification_report(y_teste, previsoes_xg))

              precision    recall  f1-score   support

           0       0.96      0.98      0.97       108
           1       0.97      0.94      0.95        63

    accuracy                           0.96       171
   macro avg       0.97      0.96      0.96       171
weighted avg       0.96      0.96      0.96       171



**Análise dados de treino**

In [149]:
previsoes_treino = xg.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [150]:
accuracy_score(y_treino, previsoes_treino)

1.0

In [151]:
confusion_matrix(y_treino, previsoes_treino)

array([[249,   0],
       [  0, 149]])

### **Validação Cruzada**

In [152]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [153]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [154]:
# Criando o modelo
modelo = XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3)
resultado = cross_val_score(modelo, previsores_esc, alvo, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 96.29%


XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores, alvo
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_esc, alvo
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder, alvo
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder_esc, alvo

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores, alvo_labelencoder_ordinal
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_esc, alvo_labelencoder_ordinal
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder, alvo_labelencoder_ordinal
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores XGboost:
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores, alvo
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_esc, alvo
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder, alvo
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder_esc, alvo
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores, alvo_labelencoder_ordinal
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_esc, alvo_labelencoder_ordinal
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder, alvo_labelencoder_ordinal
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder_esc, alvo_labelencoder_ordinal
Porem todos acertaram 100% no conjunto de teste, podendo ter a ocrrencia de overfitting**

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html

In [155]:
# Instalação do Algoritmo
!pip install lightgbm



In [156]:
import lightgbm as lgb

In [157]:
# Dataset para treino
dataset = lgb.Dataset(x_treino,label=y_treino)

# **LIGHTGBM**

In [158]:
https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html

SyntaxError: invalid syntax (1900309753.py, line 1)

In [None]:
# Instalação do Algoritmo
!pip install lightgbm

In [None]:
import lightgbm as lgb

In [None]:
# Dataset para treino
dataset = lgb.Dataset(x_treino,label=y_treino)

**Hiperparâmetros**

**Controle de ajuste**

num_leaves : define o número de folhas a serem formadas em uma árvore. Não tem uma relação direta entre num_leaves e max_depth e, portanto, os dois não devem estar vinculados um ao outro.

max_depth : especifica a profundidade máxima ou nível até o qual a árvore pode crescer.

**Controle de velocidade**

learning_rate: taxa de aprendizagem, determina o impacto de cada árvore no resultado final.

max_bin : O valor menor de max_bin reduz muito tempo de procesamento, pois agrupa os valores do recurso em caixas discretas, o que é computacionalmente mais barato.

**Controle de precisão**

num_leaves : valor alto produz árvores mais profundas com maior precisão, mas leva ao overfitting.

max_bin : valores altos tem efeito semelhante ao causado pelo aumento do valor de num_leaves e também torna mais lento o procedimento de treinamento.

In [None]:
# Parâmetros
parametros = {'num_leaves':250, # número de folhas
              'objective':'binary', # classificação Binária
              'max_depth':2,
              'learning_rate':.05,
              'max_bin':100}

In [None]:
lgbm=lgb.train(parametros,dataset,num_boost_round=200)

In [None]:
# Marcação do tempo de execução
from datetime import datetime
inicio=datetime.now()
lgbm=lgb.train(parametros,dataset)
fim=datetime.now()

tempo = fim - inicio
tempo

In [None]:
previsoes_lgbm = lgbm.predict(x_teste)
previsoes_lgbm

In [None]:
previsoes_lgbm.shape

In [None]:
# Quando for menor que 5 considera 0 e quando for maior ou igual a 5 considera 1
for i in range(0, 171):
    if previsoes_lgbm[i] >= .5:
       previsoes_lgbm[i] = 1
    else:
       previsoes_lgbm[i] = 0

In [None]:
previsoes_lgbm

In [None]:
y_teste

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_lgbm) * 100.0))

In [None]:
confusion_matrix(y_teste, previsoes_lgbm)

**Análise dados de treino**

In [None]:
previsoes_treino = lgbm.predict(x_treino)
previsoes_treino

In [None]:
previsoes_treino.shape

In [None]:
# Quando for menor que 5 considera 0 e quando for maior ou igual a 5 considera 1
for i in range(0, 398):
    if previsoes_treino[i] >= .5:
       previsoes_treino[i] = 1
    else:
       previsoes_treino[i] = 0

In [None]:
previsoes_treino

In [None]:
accuracy_score(y_treino, previsoes_treino)

In [None]:
confusion_matrix(y_treino, previsoes_treino)

### **Validação Cruzada**

In [None]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [None]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [None]:
# Criando o modelo
modelo = lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',
                            max_depth = 2, learning_rate = .05, max_bin =100)
resultado = cross_val_score(modelo, previsores_labelencoder, alvo, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

LightGBM = 95.32% (treino e teste)- e 96.11% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores, alvo
LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_esc, alvo
LightGBM = 95.32% (treino e teste)- e 96.11% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder, alvo
LightGBM = 97.08% (treino e teste)- e 85,93% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder_esc, alvo

LightGBM = 95.32% (treino e teste)- e 96.11% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores, alvo_labelencoder_ordinal
LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_esc, alvo_labelencoder_ordinal
LightGBM = 95.32% (treino e teste)- e 96.11% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder, alvo_labelencoder_ordinal
LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores LightGBM:
LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_esc, alvo
LightGBM = 97.08% (treino e teste)- e 85,93% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder_esc, alvo
LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_esc, alvo_labelencoder_ordinal
LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder_esc, alvo_labelencoder_ordinal**

# **CATBOOST**

https://catboost.ai/en/docs/

In [None]:
#Instalação
!pip install catboost

In [None]:
from catboost import CatBoostClassifier

In [None]:
df_original

In [None]:
previsores_catboost = df_original.iloc[:, 1:32]

In [None]:
previsores_catboost.head()

In [None]:
alvo_catboost = df_original.iloc[:, 0]
alvo_catboost

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_treino, x_teste, y_treino, y_teste = train_test_split(previsores_catboost, alvo_catboost, test_size = 0.3, random_state = 0)

In [None]:
catboost = CatBoostClassifier(task_type='CPU', iterations=100, learning_rate=0.1, depth = 8, random_state = 5,
                              eval_metric="Accuracy")

In [None]:
catboost.fit( x_treino, y_treino, plot=True, eval_set=(x_teste, y_teste))

In [None]:
previsoes_cat = catboost.predict(x_teste)
previsoes_cat

In [None]:
y_teste

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes_cat) * 100.0))

In [None]:
confusion_matrix(y_teste, previsoes_cat)

**Análise dados de treino**

In [None]:
previsoes_treino = catboost.predict(x_treino)
previsoes_treino

In [None]:
accuracy_score(y_treino, previsoes_treino)

In [None]:
confusion_matrix(y_treino, previsoes_treino)

### **Validação Cruzada**

In [159]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [160]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [None]:
# Criando o modelo
modelo = CatBoostClassifier(task_type='CPU', iterations=100, learning_rate=0.1, depth = 8, random_state = 5,
                              eval_metric="Accuracy")
resultado = cross_val_score(modelo, previsores, alvo, cv = kfold)

# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

CatBoost = 97.08% (treino e teste) - e 97.16% (validação cruzada) - CatBoostClassifier(task_type='CPU', iterations=100, learning_rate=0.1, depth = 8, random_state = 5, eval_metric="Accuracy") - previsores_catboost, alvo_catboost

**Melhores CatBoost:
CatBoost = 97.08% (treino e teste) - e 97.16% (validação cruzada) - CatBoostClassifier(task_type='CPU', iterations=100, learning_rate=0.1, depth = 8, random_state = 5, eval_metric="Accuracy") - previsores_catboost, alvo_catboost**

# **REDES NEURAIS ARTIFICIAIS**

In [223]:
from sklearn.neural_network import MLPClassifier

Parâmetros MLPClassifier

    hidden_layer_sizes (camadas escondidas): default (100,)

    Quant.= (Ne+Ns)/2 = (11+1)/2 = 6 neurônios
    Quant.=2/3.(Ne) + Ns = 2/3.11+1 = 8 neurônios

    activation: Função de ativação default='relu'

    solver: algoritmo matemático. Default='adam' (datasets grandes = acima de 1000 amostras). lbfgs é para datasets pequenos. sgd é com a descida do gradiente estocástico (recomendado testar).

    alpha: parâmetro para o termo de regularização de ajuste de pesos. Aumento de alpha estimula pesos menores e diminuição de alpha estimula pesos maiores. Default=0.0001.

    batch_size: tamanho dos mini lotes. default=min(200, n_samples). Não usar com o solver lbfgs.

    learning_rate: taxa de aprendizagem. default='constant'. Três tipos:
    1- 'constant':uma taxa de aprendizado constante dada pela taxa de aprendizagem inicial.
    2- 'invscaling': diminui gradualmente por: taxa efetiva = taxa inicial / t^power_t
    3- 'adaptive': a taxa é dividida por 5 cada vez que em duas épocas consecutivas não diminuir o erro.

    learning_rate_init: taxa de aprendizagem inicial. Default=0.001

    max_iter int: Número máximo de iterações. default = 200. ('sgd', 'adam').

    max_fun: Número máximo de chamadas de função de perda. Para 'lbfgs'. Default: 15000

    shuffle: default = True Usado apenas quando solver = 'sgd' ou 'adam'.

    random_state: default = None

    tol:Tolerância para a otimização.Default=0.0001

    momentum: otimização do algoritmo 'sgd'. Default: 0.9.

    n_iter_no_change: Número máximo de épocas que não atinge a tolerância de melhoria. default = 10. Apenas para solver = 'sgd' ou 'adam'

    verbose : Mostra o progresso. default=False.

In [224]:
redes = MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,
                              tol=0.0001, random_state = 3, verbose = True)

In [225]:
redes.fit(x_treino, y_treino)

Iteration 1, loss = 1.03808129
Iteration 2, loss = 1.00773847
Iteration 3, loss = 0.97835322
Iteration 4, loss = 0.94953463
Iteration 5, loss = 0.92176958
Iteration 6, loss = 0.89544873
Iteration 7, loss = 0.87075934
Iteration 8, loss = 0.84521648
Iteration 9, loss = 0.82200791
Iteration 10, loss = 0.79925166
Iteration 11, loss = 0.77761418
Iteration 12, loss = 0.75653351
Iteration 13, loss = 0.73537967
Iteration 14, loss = 0.71625817
Iteration 15, loss = 0.69668392
Iteration 16, loss = 0.67860006
Iteration 17, loss = 0.66146173
Iteration 18, loss = 0.64383919
Iteration 19, loss = 0.62804555
Iteration 20, loss = 0.61271558
Iteration 21, loss = 0.59813427
Iteration 22, loss = 0.58370630
Iteration 23, loss = 0.57032948
Iteration 24, loss = 0.55742343
Iteration 25, loss = 0.54557349
Iteration 26, loss = 0.53390998
Iteration 27, loss = 0.52259752
Iteration 28, loss = 0.51161282
Iteration 29, loss = 0.50137740
Iteration 30, loss = 0.49126661
Iteration 31, loss = 0.48248264
Iteration 32, los

In [226]:
previsoes = redes.predict(x_teste)
previsoes

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0])

In [227]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [228]:
print("Acurácia: %.2f%%" % (accuracy_score(y_teste, previsoes) * 100.0))

Acurácia: 97.66%


In [229]:
confusion_matrix(y_teste, previsoes)

array([[107,   1],
       [  3,  60]])

In [230]:
print(classification_report(y_teste, previsoes))

              precision    recall  f1-score   support

           0       0.97      0.99      0.98       108
           1       0.98      0.95      0.97        63

    accuracy                           0.98       171
   macro avg       0.98      0.97      0.97       171
weighted avg       0.98      0.98      0.98       171



**Análise dados de treino**

In [231]:
previsoes_treino = redes.predict(x_treino)
previsoes_treino

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,

In [232]:
accuracy_score(y_treino, previsoes_treino)

0.9874371859296482

In [233]:
confusion_matrix(y_treino, previsoes_treino)

array([[248,   1],
       [  4, 145]])

### **Validação Cruzada**

In [234]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [235]:
# Separando os dados em folds
kfold = KFold(n_splits = 30, shuffle=True, random_state = 5)

In [236]:
# Criando o modelo
modelo = MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,
                              tol=0.0001, random_state = 3, verbose = True)
resultado = cross_val_score(modelo, previsores_labelencoder_esc, alvo, cv = kfold)
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Iteration 1, loss = 1.01347781
Iteration 2, loss = 0.97014242
Iteration 3, loss = 0.92977035
Iteration 4, loss = 0.89119272
Iteration 5, loss = 0.85242147
Iteration 6, loss = 0.81671148
Iteration 7, loss = 0.78455898
Iteration 8, loss = 0.75329005
Iteration 9, loss = 0.72230952
Iteration 10, loss = 0.69464700
Iteration 11, loss = 0.66793965
Iteration 12, loss = 0.64223124
Iteration 13, loss = 0.61925424
Iteration 14, loss = 0.59741343
Iteration 15, loss = 0.57695446
Iteration 16, loss = 0.55824752
Iteration 17, loss = 0.54061703
Iteration 18, loss = 0.52359523
Iteration 19, loss = 0.50803694
Iteration 20, loss = 0.49332014
Iteration 21, loss = 0.47938886
Iteration 22, loss = 0.46693304
Iteration 23, loss = 0.45457585
Iteration 24, loss = 0.44353093
Iteration 25, loss = 0.43278704
Iteration 26, loss = 0.42238465
Iteration 27, loss = 0.41278955
Iteration 28, loss = 0.40327495
Iteration 29, loss = 0.39468076
Iteration 30, loss = 0.38629353
Iteration 31, loss = 0.37790198
Iteration 32, los

Redes Neurais Artificiais = 63.16% (treino e teste) - e 62.72% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsoes, alvo
Redes Neurais Artificiais = 97.66%% (treino e teste) - e 97.17%% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_esc, alvo
Redes Neurais Artificiais = 63.16%% (treino e teste) - e 62.72% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder, alvo
Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder_esc, alvo

Redes Neurais Artificiais = 63.16% (treino e teste) - e 62.72% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsoes, alvo_labelencoder_ordinal
Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_esc, alvo_labelencoder_ordinal
Redes Neurais Artificiais = 63.16% (treino e teste) - e 62.72% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder, alvo_labelencoder_ordinal
Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores Redes Neurais Artificiais:
Redes Neurais Artificiais = 97.66%% (treino e teste) - e 97.17%% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_esc, alvo
Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder_esc, alvo
Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_esc, alvo_labelencoder_ordinal
Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder_esc, alvo_labelencoder_ordinal**

# **Salvando dados para Deploy**

In [None]:
previsores

In [None]:
alvo

In [None]:
np.savetxt('../output/previsores.csv', previsores, delimiter=',')

In [None]:
np.savetxt('../output/alvo.csv', alvo, delimiter=',')

# **CONCLUSÃO**

DESENVOLVER E SELECIONAR O MELHOR ALGORITMO DE MACHINE LEARNING DE CLASSIFICAÇÃO PARA O DATASET DO LINK A SEGUIR:

https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

**Melhor Metodo:**
**-1º-** Regressão logística = 97.66% (treino e teste) - e 98.06%% (validação cruzada com previsores escalonados) com previsores escalonados

**-2ª-** SVM com 97.66% (treino e teste) - e 97.88% (validação cruzada) com previsores escalonados

**-3º-** Redes Neurais Artificiais = 97.66%% (treino e teste) - e 97.17%% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) -  com previsores escalonados

**-4º-** LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - com previsores escalonados

**-5ª-** CatBoost = 97.08% (treino e teste) - e 97.16% (validação cruzada) - CatBoostClassifier(task_type='CPU', iterations=100, learning_rate=0.1, depth = 8, random_state = 5, eval_metric="Accuracy") - com previsores gerados pelo catboos

**Melhores Naive Bayes:**
Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores, alvo

Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores_labelencoder, alvo

Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores, alvo_labelencoder_ordinal

Naive Bayes = 92.40% (treino e teste) - e 93.82% (validação cruzada) - previsores_labelencoder, alvo_labelencoder_ordinal

**Melhores SVM:**
SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_esc, alvo

SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder_esc, alvo

SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_esc, alvo_labelencoder_ordinal

SVM = 97.66% (treino e teste) - e 97.88% (validação cruzada)- SVC(kernel='rbf', random_state=1, C = 2) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores Regressão Logística:**
Regressão logística = 97.66% (treino e teste) - e 98.06%% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_esc, alvo

Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada com previsores_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder_esc, alvo

Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=2000, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_esc, alvo_labelencoder_ordinal

Regressão logística = 97.66% (treino e teste) - e 98.06% (validação cruzada cruzada com previsores_labelencoder_esc) - LogisticRegression(random_state=1, max_iter=600, penalty="l2", tol=0.0001, C=1,solver="lbfgs") - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores KNN:**
KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores, alvo

KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_labelencoder, alvo

KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores, alvo_labelencoder_ordinal

KNN = 96.49% (treino e teste) - e 93.13% (validação cruzada)- KNeighborsClassifier(n_neighbors=7, metric='minkowski', p = 1) - previsores_labelencoder, alvo_labelencoder_ordinal**

**Melhores Árvore de decisão:**
Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores, alvo

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_esc, alvo

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder, alvo

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) -previsores_labelencoder_esc, alvo

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores, alvo_labelencoder_ordinal

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_esc, alvo_labelencoder_ordinal

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder, alvo_labelencoder_ordinal

Árvore de decisão = 95.32% (treino e teste) - e 92.44% (validação cruzada) - DecisionTreeClassifier(criterion='entropy', random_state = 0, max_depth=3) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores Random Forest:**
Random Forest = 96.49% (treino e teste)- e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores, alvo

Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores_esc, alvo

Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) - previsores_labelencoder, alvo

Random Forest = 96.49% (treino e teste) - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) -previsores_labelencoder_esc, alvo

Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores, alvo_labelencoder_ordinal

Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_esc, alvo_labelencoder_ordinal

Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_labelencoder, alvo_labelencoder_ordinal

Random Forest = 96.49% - e 95.76% (validação cruzada) - RandomForestClassifier(n_estimators=150, criterion='entropy', random_state = 0, max_depth=4) (treino e teste) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores XGboost: Porem todos acertaram 100% no conjunto de teste, podendo ter a ocrrencia de overfitting**
XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores, alvo

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_esc, alvo

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder, alvo

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder_esc, alvo

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores, alvo_labelencoder_ordinal

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_esc, alvo_labelencoder_ordinal

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder, alvo_labelencoder_ordinal

XGboost = 96.49% - e 96.29% (validação cruzada) - XGBClassifier(max_depth=2, learning_rate=0.05, n_estimators=250, objective='binary:logistic', random_state=3) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores LightGBM:**
LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_esc, alvo

LightGBM = 97.08% (treino e teste)- e 85,93% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder_esc, alvo

LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_esc, alvo_labelencoder_ordinal

LightGBM = 97.08% (treino e teste)- e 96.12% (validação cruzada) - lgb.LGBMClassifier(num_leaves = 250, objective = 'binary',  max_depth = 2, learning_rate = .05, max_bin =100) - previsores_labelencoder_esc, alvo_labelencoder_ordinal

**Melhores CatBoost:**
CatBoost = 97.08% (treino e teste) - e 97.16% (validação cruzada) - CatBoostClassifier(task_type='CPU', iterations=100, learning_rate=0.1, depth = 8, random_state = 5, eval_metric="Accuracy") - previsores_catboost, alvo_catboost

**Melhores Redes Neurais Artificiais:**
Redes Neurais Artificiais = 97.66%% (treino e teste) - e 97.17%% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_esc, alvo

Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder_esc, alvo

Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_esc, alvo_labelencoder_ordinal

Redes Neurais Artificiais = 97.66% (treino e teste) - e 97.17% (validação cruzada)- MLPClassifier(hidden_layer_sizes=(7), activation='relu', solver='adam', max_iter =800,tol=0.0001, random_state = 3, verbose = True) - previsores_labelencoder_esc, alvo_labelencoder_ordinal