# ***Breast Cancer Wisconsin Dataset***
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

# Dataset
---
### Classification problem
* 10 input variables
* 1 **binary** output variable (benign or malignant)

### Originally hosted by UCI

### 569 data samples
* Use the first 100 samples as **test set**
* Use the next 100 samples as **validation set**
* Use the others as **training set**
---
# Data PreParation
* Remove the rows with missing values "***?***"
* Load it in the python
* Drop the first column : ID
* ***Normalize the input variables***
* Set the output variable
  - Malignant: 1 / benign: 0
* Data split: train, test, validation set
---
# Basic model
* Model Structure
  - 9 inputs
  - 10 hidden neurons / Relu
  - 1 output neuron / signoid
*Compile and learning condition
  - Optimizer = rmsprop
  - Loss function = binary crossentropy
  - Epochs = 200
  - Batch_size = 10
  - EarlyStopping with patience = 2

###1. **Import models & 데이터 불러오기**


 **COLUMN 11 (10 in code)=> (2 for benign, 4 for malignant)**

In [None]:
import pandas as pd
import numpy as np
from tensorflow.keras import models, layers, optimizers, losses, metrics
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler

DATA_PATH = "/content/drive/MyDrive/ColabNotebooks/week6/breast-cancer-wisconsin.data"
cancer_origin_data = pd.read_csv(DATA_PATH, delimiter=",")
cancer_origin_data.columns = ["0","1","2","3","4","5","6","7","8","9","10"]
print(cancer_origin_data)

###2. **Data preparation**
  - Test : 0 ~ 100, Val : 100 ~ 200, Train : 200~
  - input : 10 -> 9 (Drop first column / all columns except first one)
  - output : 1 (M:4, B:2 => M:1, B:0)
  - Normalize (ManMixScaler 사용)

In [None]:
## Drop "?" rows
for label in cancer_origin_data:
  for index, data in enumerate(cancer_origin_data.loc[:, label]):
      if data == "?":
        cancer_origin_data = cancer_origin_data.drop(index)
# index 재조정
cancer_origin_data.index = range(0,len(cancer_origin_data))
# output binary 로 변경
for index, data in enumerate(cancer_origin_data['10']):
  if data == 2:
    cancer_origin_data['10'][index] = 0
  elif data == 4:
    cancer_origin_data['10'][index] = 1
# normalization 후 첫번째 열 제거
normalization = MinMaxScaler()
norm_data = normalization.fit_transform(cancer_origin_data)
input_and_output = np.delete(norm_data, 0, axis=1)
# input (x), output split (y)
x_data = input_and_output[:, 0:9]
y_data = input_and_output[:, 9]
# train, val, test split
# cf) .loc[a:b] => a부터b까지 / [a:b] => a부터b-1까지
x_test = x_data[:100, :]
y_test = y_data[:100]
x_val = x_data[100:200, :]
y_val = y_data[100:200]
x_train = x_data[200:, :]
y_train = y_data[200:]

###3. **모델 디자인**
  - 9 inputs
  - 10 hidden neurons / Relu
  - 1 output neuron / sigmoid
  - compile
    - Optimizer = rmsprop
    - Loss function = binary crossentropy

In [None]:
model = models.Sequential()
model.add(layers.Dense(10, activation='relu', input_shape=(9, )))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=optimizers.RMSprop(learning_rate=0.001),
              loss=losses.binary_crossentropy,
              metrics=[metrics.binary_accuracy])

###4. 학습
- Epochs = 200
- Batch_size = 10
- EarlyStopping with patience = 2

In [None]:
model.fit(x_train, y_train, epochs=200, batch_size=10,
          validation_data=(x_val, y_val),
          callbacks=[EarlyStopping(monitor='val_loss', patience=2)])

test_loss, test_acc = model.evaluate(x_test, y_test)
print("test_loss: ", test_loss, "test_acc: ", test_acc)

### Extra Question 2

get_weights() 를 이용


In [None]:
weight = model.get_weights()[0]
bias = model.get_weights()[1]

for i, w in enumerate(weight):
  if i > 1 and i < 7:
    if i == 2:
      print(".......")
  else:
    for j, n in enumerate(w):
      if j == len(w) / 2:
        print(i, " input nueron = |===>", j, "hidden neuron weight :", n)
      else:
        print("                  |===>", j, "hidden neuron weight :", n)
    print()

print("+"*50)
for i, b in enumerate(bias):
  print(i, "hidden layer neuron's bias :", b)