# Classificação Binária - Breast Cancer RNA

- **Objetivo**: Construir um modelo de rede neural, cuja finalidade é diagnosticar com conformidade a presença  
de tumores mamários _benignos_ e _malignos_, concluindo se o paciente possui câncer de mama.

### Metodologias e bibliotecas

- __dataset__: breast_cancer.csv
- __modelo__: tensorflow, KerasClassifier
- __dataframe__: pandas
- __validação de desempenho__: cross_val_score

### Descrição de atributos

> __radius__: Média da distância do centro ao perímetro  
> __texture__: Desvio padrão dos valores da escala de cinza  
> __perimeter__: Perímetro do núcle celular  
> __area__: Área do núcleo celular  
> __smoothness__: Suavidade das bordas (variações locais)  
> __compactness__: Grau de compacidade = perímetro² / área - 1.0  
> __concavity__: Gravidade das concavidades nas bordas  
> __concave points__: Número de pontos côncavos na borda  
> __symmetry__: Grau de simetria do núcleo  
> __fractal_dimension__: Complexidade da borda (dimensão fractal)  

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import warnings

import scikeras
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score #Validação Cruzada
from tensorflow.keras.models import Sequential #Cria a estrutura da rede neural (sequência de camadas)
from tensorflow.keras import backend as k

In [2]:
df_input = pd.read_csv('entradas_breast.csv', encoding='utf-8')
df_output = pd.read_csv('saidas_breast.csv', encoding='utf-8')
df = pd.concat([df_input, df_output], axis=1, ignore_index=False)
df.head()

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave_points_worst,symmetry_worst,fractal_dimension_worst,0
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,186.0,275.0,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,243.0,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,173.0,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,198.0,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,205.0,0.4,0.1625,0.2364,0.07678,0


In [3]:
df = df.rename(columns={df.columns[-1]: 'Class'})

In [4]:
def create_net():
    k.clear_session() #Limpa as sessões antes de criar a estrutura da rede neural
    neural_net = Sequential([
        tf.keras.layers.InputLayer(shape=(30,)), #Camada de entrada (atributos previsores)
        tf.keras.layers.Dense(units=16, activation='relu', kernel_initializer='random_uniform'), #Primeira camada oculta
        tf.keras.layers.Dropout(rate=0.2), #Dropout reduz o overfitting (rate = porcentagem de neurônios que serão removidos)
        tf.keras.layers.Dense(units=16, activation='relu', kernel_initializer='random_uniform'), #Segunda camada oculta
        tf.keras.layers.Dropout(rate=0.2), #Dropout reduz o overfitting (rate = porcentagem de neurônios que serão removidos)
        tf.keras.layers.Dense(units=1, activation='sigmoid') #Camada de saída (resultado)
    ])
    otimizador = tf.keras.optimizers.Adam(learning_rate=0.001, clipvalue=0.5)
    neural_net.compile(optimizer=otimizador, loss='binary_crossentropy', metrics=['binary_accuracy'])
    return neural_net

In [5]:
neural_net = KerasClassifier(model=create_net, epochs=100, batch_size=10) #Faz o ajuste dos pesos de 10 em 10 registros

In [6]:
pred_cols = [cname for cname in df.select_dtypes(exclude=['int']).columns.values.ravel()]

x = df.loc[:, pred_cols]
y = df[['Class']]

In [7]:
#cv indica o número de folds divididos
outcome = cross_val_score(estimator=neural_net, X=x, y=y, cv=10, scoring='accuracy')


Epoch 1/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 6ms/step - binary_accuracy: 0.5799 - loss: 2.7005
Epoch 2/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - binary_accuracy: 0.6076 - loss: 0.9051
Epoch 3/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - binary_accuracy: 0.6850 - loss: 0.7326
Epoch 4/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - binary_accuracy: 0.7226 - loss: 0.6280
Epoch 5/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - binary_accuracy: 0.6798 - loss: 0.7336
Epoch 6/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - binary_accuracy: 0.6873 - loss: 0.5663
Epoch 7/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - binary_accuracy: 0.7255 - loss: 0.5667
Epoch 8/100
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - binary_accuracy: 0.7209 - loss

In [8]:
outcome

array([0.87719298, 0.80701754, 0.89473684, 0.92982456, 0.85964912,
       0.85964912, 0.89473684, 0.9122807 , 0.89473684, 0.91071429])

## Resultado final

In [9]:
outcome.mean() #Acurácia média

0.8840538847117794

In [10]:
outcome.std()

0.03334323973717352

## Overfitting x Underfitting

1. __Overfitting__ consiste na adaptação exagerada dos dados treinados com uma ótima acurácia, porém o resultado do teste tem um desempenho péssimo.
2. __Underfitting__ possui resultados ruins na base de teste, porém tem alto desempenho na base de treino.