# Activity 5.04 - Breast Cancer Diagnosis Classification using Artificial Neural Networks (with Answers)

In this activity we will be using the Breast Cancer dataset [https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)]( https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) ) available under the [UCI Machine Learning Repository] (https://archive.ics.uci.edu/ml/index.php).  The dataset contains characteristics of the cell nuclei present in the digitized image of a fine needle aspirate (FNA) of a breast mass, with the labels _malignant_ and _benign_ for each cell nucleus. Throughout this activity we will use the measurements provided in the dataset to classify between malignant and benign cells.

## Import the Required Packages
For this exercise we will require the Pandas package for loading the data, the matplotlib package for plotting as well as scikit-learn for creating the Neural Network model, doing some feature selection as well as model selection.  Import all of the required packages and relevant modules for these tasks.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

## Load the Data
Load the Breast Cancer Diagnosis dataset using Pandas and examine the first 5 rows

In [3]:
df = pd.read_csv('../Datasets/breast-cancer-data.csv')
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,diagnosis
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,malignant
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,malignant
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,malignant
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,malignant
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,malignant


Dissect the data into input (X) and output (y) variables

In [4]:
X, y = df[[c for c in df.columns if c != 'diagnosis']], df.diagnosis

## Feature Engineering
As we see in the above 5 rows of the dataset, different columns have different scales of magnitude, hence, before constructing and training a neural network model, we normalize the dataset. For this, we use the MinMaxScaler api from sklearn which normalizes each column values between 0 to 1, as discussed in the Logistic Regression section of this chapter (see Exercise 3.03)

In [5]:
X_array = X.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
X_array_scaled = min_max_scaler.fit_transform(X_array)
X = pd.DataFrame(X_array_scaled, columns=X.columns)

Let us examine first five rows of the normalized dataset.

In [6]:
X.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,0.521037,0.022658,0.545989,0.363733,0.593753,0.792037,0.70314,0.731113,0.686364,0.605518,...,0.620776,0.141525,0.66831,0.450698,0.601136,0.619292,0.56861,0.912027,0.598462,0.418864
1,0.643144,0.272574,0.615783,0.501591,0.28988,0.181768,0.203608,0.348757,0.379798,0.141323,...,0.606901,0.303571,0.539818,0.435214,0.347553,0.154563,0.192971,0.639175,0.23359,0.222878
2,0.601496,0.39026,0.595743,0.449417,0.514309,0.431017,0.462512,0.635686,0.509596,0.211247,...,0.556386,0.360075,0.508442,0.374508,0.48359,0.385375,0.359744,0.835052,0.403706,0.213433
3,0.21009,0.360839,0.233501,0.102906,0.811321,0.811361,0.565604,0.522863,0.776263,1.0,...,0.24831,0.385928,0.241347,0.094008,0.915472,0.814012,0.548642,0.88488,1.0,0.773711
4,0.629893,0.156578,0.630986,0.48929,0.430351,0.347893,0.463918,0.51839,0.378283,0.186816,...,0.519744,0.123934,0.506948,0.341575,0.437364,0.172415,0.319489,0.558419,0.1575,0.142595


## Constructing the Neural Network Model
Before we can construct the model we must first convert the dignosis values into labels that can be used within the model.  Replace:

1. The diagnosis string *benign* with the value 0
2. The diagnosis string *malignant* with the value 1

In [7]:
diagnoses = [
    'benign', # 0
    'malignant', # 1
]
output = [diagnoses.index(diag) for diag in y]

Also, in order to impartially evaluate the model, we should split the training dataset into a training and a validation set.

In [8]:
train_X, valid_X, train_y, valid_y = train_test_split(X, output, 
                                                      test_size=0.2, random_state=123)

Create the model using the normalized dataset and the assigned *diagnosis* labels

In [8]:
model = MLPClassifier(solver='sgd', hidden_layer_sizes=(100,), max_iter=1000, random_state=1, 
                      learning_rate_init=.01)
model.fit(X=train_X, y=train_y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(100,), learning_rate='constant',
              learning_rate_init=0.01, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='sgd', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)

Compute the accuracy of the model against the validation set:

In [9]:
model.score(valid_X, valid_y)

0.9824561403508771