# Breast Cancer Wisconsin (Diagnostic)

Target: Predict whether the cancer is benign or malignant

Source: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

In this notebook, I will touch medicine in practice. I analyze breast cancer Wisconsin. It is a diagnostic data set.
I studied computer science in medicine, so now I would like to combine medical knowledge with machine learning.

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. Fine-needle aspiration (FNA) is a diagnostic procedure used to investigate lumps or masses. In this technique, a thin (23–25 gauge), hollow needle is inserted into the mass for sampling of cells that, after being stained, will be examined under a microscope (biopsy).

https://en.wikipedia.org/wiki/Fine-needle_aspiration

This is Keras version.

In [9]:
import numpy as np
import pandas as pd

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D

In [2]:
df = pd.read_csv('input/data.csv')
df = df.drop(['id', 'Unnamed: 32'], 1)

In [3]:
df['diagnosis_cat'] = pd.factorize(df['diagnosis'])[0]

In [4]:
def feats(df):
    feats_from_df = set(df.select_dtypes([np.int, np.float]).columns.values)
    bad_feats = {'diagnosis_cat'}
    return list(feats_from_df - bad_feats)

df_scaled = df
df_scaled[feats(df)] = preprocessing.scale(df[feats(df)])

In [5]:
X = df_scaled[feats(df_scaled)].values
y = df_scaled['diagnosis_cat']

In [41]:
def mlp_net():    
    model = Sequential([
        # first hodden layer + size of input layer
        Dense(output_dim=16, init='uniform', activation='relu', input_dim=30),
        # dropout - figth with overfitting
        Dropout(0.1),
        # second hidden layer
        Dense(output_dim=16, init='uniform', activation='relu'),
        Dropout(0.05),
        # output layer
        Dense(output_dim=1, init='uniform', activation='sigmoid')])
    return model

In [42]:
mlp = mlp_net()
mlp.summary()
mlp.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_15 (Dense)             (None, 16)                496       
_________________________________________________________________
dropout_14 (Dropout)         (None, 16)                0         
_________________________________________________________________
dense_16 (Dense)             (None, 16)                272       
_________________________________________________________________
dropout_15 (Dropout)         (None, 16)                0         
_________________________________________________________________
dense_17 (Dense)             (None, 1)                 17        
Total params: 785
Trainable params: 785
Non-trainable params: 0
_________________________________________________________________


  after removing the cwd from sys.path.
  
  # This is added back by InteractiveShellApp.init_path()


In [38]:
mlp.fit(X, y, batch_size=128, epochs=15);

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [39]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 0)
mlp.fit(X_train, y_train, batch_size=128, epochs=15);

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [33]:
result = mlp.evaluate(X_test, y_test, verbose = 0)
print('Accuracy: ', result[1])
print('Error: %.2f%%' % (100- result[1]*100))

Accuracy:  0.9649122807017544
Error: 3.51%
