<a href="https://colab.research.google.com/github/SamG1002/DataScience/blob/main/CancerMama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Importação e Tratamento**

In [None]:
import pandas as pd
import numpy as np
# KNN
from sklearn.neighbors import KNeighborsClassifier
# Treinamento e GridSearchCV para Achar melhor K do knn
from sklearn.model_selection import GridSearchCV, train_test_split
# Teste
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Grafico
import matplotlib.pyplot as plt
# Warning
import warnings
warnings.filterwarnings("ignore")

**Dataset**

In [None]:
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer/breast-cancer.data')

**Colocando nome para as colunas**

In [None]:
df.columns = ["Class", "age", "menopause", "tumor-size", "Inv-nodes", "node-caps", "deg-malig", "breast", "breast-quad", "Irradiant"]
df.head()

Unnamed: 0,Class,age,menopause,tumor-size,Inv-nodes,node-caps,deg-malig,breast,breast-quad,Irradiant
0,no-recurrence-events,40-49,premeno,20-24,0-2,no,2,right,right_up,no
1,no-recurrence-events,40-49,premeno,20-24,0-2,no,2,left,left_low,no
2,no-recurrence-events,60-69,ge40,15-19,0-2,no,2,right,left_up,no
3,no-recurrence-events,40-49,premeno,0-4,0-2,no,2,right,right_low,no
4,no-recurrence-events,60-69,ge40,15-19,0-2,no,2,left,left_low,no


**Verificando se contêm dados irregulares**

In [None]:
df.value_counts()

Class                 age    menopause  tumor-size  Inv-nodes  node-caps  deg-malig  breast  breast-quad  Irradiant
no-recurrence-events  50-59  ge40       20-24       0-2        no         3          left    left_up      no           2
recurrence-events     50-59  ge40       40-44       6-8        yes        3          left    left_low     yes          2
no-recurrence-events  50-59  ge40       15-19       0-2        no         1          right   central      no           2
                      60-69  ge40       15-19       0-2        no         2          left    left_low     no           2
recurrence-events     50-59  premeno    25-29       0-2        no         2          left    right_up     no           2
                                                                                                                      ..
no-recurrence-events  50-59  ge40       25-29       0-2        no         1          left    left_low     no           1
                                     

**Encontramos o "?" como dados vazios, assim vamos substituir por valores nulos**

In [None]:
df.replace("?", pd.NA, inplace=True)

**Encontramos então um total de 8 nulos no node-caps e 1 em breast-quad**

In [None]:
df.isnull().sum()

Class          0
age            0
menopause      0
tumor-size     0
Inv-nodes      0
node-caps      8
deg-malig      0
breast         0
breast-quad    1
Irradiant      0
dtype: int64

**Agora irei dropar essas linhas com informações vazias usando "dropna".**

In [None]:
df.dropna(inplace=True)

**Vemos que não contêm mais nulos.**

In [None]:
df.isnull().sum()

Class          0
age            0
menopause      0
tumor-size     0
Inv-nodes      0
node-caps      0
deg-malig      0
breast         0
breast-quad    0
Irradiant      0
dtype: int64

# **Analise**

**Identificamos que essas colunas não seriam relevantes para a reicidência do cancer de mama, portanto retiramos do dataset**

In [None]:
df = df.drop(['Inv-nodes', 'breast', 'breast-quad'], axis=1)

**Vamos classificar alguns dados com One Hot, usamos o get_dummies**

In [None]:
# Colunas no qual quero classificar
classificacao = ['age', 'tumor-size', 'menopause', 'node-caps', 'Irradiant', 'deg-malig']

In [None]:
df = pd.get_dummies(df, columns=pd.Series(classificacao))

**Vemos os dados Classificativos normalizados**

In [None]:
df.head()

Unnamed: 0,Class,age_20-29,age_30-39,age_40-49,age_50-59,age_60-69,age_70-79,tumor-size_0-4,tumor-size_10-14,tumor-size_15-19,...,menopause_ge40,menopause_lt40,menopause_premeno,node-caps_no,node-caps_yes,Irradiant_no,Irradiant_yes,deg-malig_1,deg-malig_2,deg-malig_3
0,no-recurrence-events,0,0,1,0,0,0,0,0,0,...,0,0,1,1,0,1,0,0,1,0
1,no-recurrence-events,0,0,1,0,0,0,0,0,0,...,0,0,1,1,0,1,0,0,1,0
2,no-recurrence-events,0,0,0,0,1,0,0,0,1,...,1,0,0,1,0,1,0,0,1,0
3,no-recurrence-events,0,0,1,0,0,0,1,0,0,...,0,0,1,1,0,1,0,0,1,0
4,no-recurrence-events,0,0,0,0,1,0,0,0,1,...,1,0,0,1,0,1,0,0,1,0


**Fazemos a "Entrada" retirando a classe do dataset**

In [None]:
entrada = df.drop('Class', axis=1)

**Trocamos o valor da classe para 0 e 1 e guardamos na "saida"**

In [None]:
saida = df['Class'].replace("no-recurrence-events", 0).replace("recurrence-events", 1)

**Transformamos a saida em np.array pois é mais eficiente**

In [None]:
saida = np.array(saida.tolist())

# Treinamento

**Calculando numero de neuronios ocultos (usamos a primeira dica)**

In [None]:
round((entrada.shape[1] + 1) / 2)

14

**Fazemos o treinamento da Rede Neural**

In [None]:
redeneural = MLPClassifier(
                            max_iter=1000,
                            activation='identity',
                            hidden_layer_sizes=(8, 4, 2),
                            learning_rate_init =0.0001)
redeneural.fit(entrada, saida)

**Dividindo o dataset em um conjunto de treinamento e um conjunto de teste**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(entrada, saida, train_size=0.7, test_size=0.3, random_state=20)

**Fazemos uma Previsão com os dados de teste separado na nossa Rede Neural**

In [None]:
y_pred = redeneural.predict(X_test)

# **Veremos Nossos Resultados**

**Calculamos a precisão de acertos da nossa Rede Neural**

In [None]:
print("Taxa de Acerto: {:.1f}%".format(accuracy_score(y_pred, y_test) * 100))

Taxa de Acerto: 79.5%


# **KNN**

**Faremos o tratamento com o KNN pois este dataset funciona bem melhor uma aprendizagem de Maquina Supervisionado para termos um aumento da previsão de acerto**

**Criamos uma lista com valores impares possivéis de K**


In [None]:
# Definir valores de k a serem testados
k = {'n_neighbors': range(1, 20, 2)}

**Declaramos o objeto KNN**

In [None]:
knn = KNeighborsClassifier()

**Criar objeto GridSearchCV passando o KNN, a lista de K, cv(vezes que testará o score) e o resultado interessado, no nosso caso accuracy(score)**


In [None]:
knn = GridSearchCV(knn, k, cv=10, scoring='accuracy')

**Basta agora treinar e mostrar o resultado**

In [None]:
knn.fit(X_train, y_train)
print("Taxa de Acerto KNN: {:.2f}%".format(knn.score(X_test, y_test) * 100))

Taxa de Acerto KNN: 75.90%


**Salvamos o dataset tratado**

In [None]:
df.to_csv('dados.csv', index=False)