<a href="https://colab.research.google.com/github/AD-I/Breast-Cancer-Model/blob/master/Breast%20Cancer%20model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Breast Cancer Neuronal Network

This notebook is about Deep Neuroanl Network facing with the classification of Breast Cancer between malignant and benign. With supervised learning and using the data from Kaggle [Breast Cancer Wisconsin (Diagnostic) Data Set](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) the network will finally get an accuaracy about 96%. 



## Import needed libraries 


In [None]:
import tensorflow as tf

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Load breast cancer dataset

In [None]:
!git clone https://github.com/AD-I/Breast-Cancer-Model.git

In [None]:
df = pd.read_csv('Breast-Cancer-Model/data.csv')

In [None]:
df

As we can see, diagnosis column will be the target while the otherones will be the features.

Also, diagnosis column must be transformed to numerical data (malignant: 1, benign: 0) and the final column shouldn't be there.

##Modify the dataset

In [None]:
mapdict = {'M':1,'B':0}
df['diagnosis'] = df['diagnosis'].map(mapdict)
df = df.drop(columns=['Unnamed: 32'])

In [None]:
df

In [None]:
df.describe()

Let's check the features that most influence the final diagnosis.

In [None]:
df.corr()['diagnosis'].sort_values(ascending=[False])

In my case, I will take the first 12 cause they are the most influential features.

##Extract the data by **Features** and **Labels** 

In [None]:
labels = df['diagnosis']
my_features = ['concave points_worst',
               'perimeter_worst',
               'concave points_mean',
               'radius_worst',
               'perimeter_mean',
               'area_worst',
               'radius_mean',
               'area_mean',
               'concavity_mean',
               'concavity_worst',
               'compactness_mean',
               'compactness_worst'
               ]
features = df[my_features]
labels = labels.to_numpy()
features = features.to_numpy()


Let's check the data shape.

In [None]:
print('Labels')
print(labels.shape, end=' ')
print(labels.ndim)

print('\nFeatures:')
print(features.shape, end=' ')
print(features.ndim)

###Split the data between train, validation and test data

Before the split, I scale the features to improve training performance.

In [None]:
from sklearn.preprocessing import MinMaxScaler

min_max_scaler = MinMaxScaler()
features = min_max_scaler.fit_transform(features)

Now it's time to split the data in:
* Training data: 70%
* Validation data: 20%
* Test data: 10%

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_valid, y_train, y_valid = train_test_split(features,
                                                      labels,
                                                      test_size=0.3)
x_valid, x_test, y_valid, y_test = train_test_split(x_valid,
                                                    y_valid,
                                                    test_size=0.33)

In [None]:
print('Train')
print('Features: ', x_train.shape, end=' Labels: ')
print(y_train.shape)
print('Validation')
print('Features: ', x_valid.shape, end=' Labels: ')
print(y_valid.shape)
print('Test')
print('Features: ', x_test.shape, end=' Labels: ')
print(y_test.shape)

Finally transform numpy arrays to tensors.

In [None]:
x_train = tf.convert_to_tensor(x_train, np.float64)
y_train = tf.convert_to_tensor(y_train, np.float64)

x_valid = tf.convert_to_tensor(x_valid, np.float64)
y_valid = tf.convert_to_tensor(y_valid, np.float64)

x_test = tf.convert_to_tensor(x_test, np.float64)
y_test = tf.convert_to_tensor(y_test, np.float64)

##Model

### Functional API Model

In [None]:
def create_model(lr=0.001, n_hidden_nodes=6, dropout=True):
  '''Creates a model by the Keras functional API. Layers:
  Input: 12 relu, Dense: n_hidden_nodes relu, Output: 2 softmax.
  lr = (Float) learning rate for the optimizer Adam.
  n_hidden_nodes = (Int) N nodes for the hidden layer.
  droput = (Bool) If True, it will be added a dropout layer between Dense16
           and output.
  return's a tf.keras.Model
  '''
  tf.keras.backend.clear_session()

  inputs = tf.keras.Input(shape=(12,), name='input')
  x = tf.keras.layers.Dense(n_hidden_nodes, activation='relu')(inputs)
  if dropout:
    drop = tf.keras.layers.Dropout(.2)(x)
    outputs = tf.keras.layers.Dense(2, activation='softmax')(drop)
  else:
    outputs = tf.keras.layers.Dense(2, activation='softmax')(x)

  model = tf.keras.Model(
      inputs=inputs, outputs=outputs, name='breast_cancer_model')

  model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
    metrics=['accuracy']
  )

  return model

First of all, check the appropriate learning rate for the model.

In [None]:
model = create_model(1e-5, 16, dropout=False)
epochs = 150
batch_size = 32

lr_schedule = tf.keras.callbacks.LearningRateScheduler(
    (lambda epoch: 1e-5 * 10**(epoch / 30))
)

history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=2,
    validation_data=(x_valid, y_valid),
    callbacks=[lr_schedule]
)

In [None]:
model.summary()

In [None]:
tf.keras.utils.plot_model(
    model, to_file='model_graph.png', show_shapes=True, show_layer_names=True)


Plot the curve between the learning rate and loss. Finally I toke the value of 2e-3 for the learning rate, just before it becomes unstable.

In [None]:
plt.semilogx(history.history['lr'], history.history['loss'])
plt.axis([1e-5, 1e-1, 0, 1])
plt.show()

Now train the model with the good learning rate. Also added early stopping and model checkpoint to find the best model.

In [None]:
model = create_model(2e-3, 16)
epochs = 600
batch_size = 32

early_stopping = tf.keras.callbacks.EarlyStopping(patience=15)
checkpoint = tf.keras.callbacks.ModelCheckpoint(
    'model_checkpoint.h5', save_best_only=True)

history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=2,
    validation_data=(x_valid, y_valid),
    callbacks=[early_stopping, checkpoint]
)

Finally load the best model between this epochs.

In [None]:
model = tf.keras.models.load_model('model_checkpoint.h5')

###Show loss and accuaracy per epochs

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(len(acc))

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

###Predict values

In [None]:
def predict_value(model, features, labels, n):
  '''Predict some value
  model: tf.keras.Model trained
  features: 
  labels: 
  n: (int) index of feautres
  return: 
  '''
  data = tf.convert_to_tensor(features[n])
  data = tf.expand_dims(data, axis=0)
  print(f"Label: {labels[n]}")
  return model.predict(data)

In [None]:
predict_value(model, features, labels, 100)

### Use test set

In [None]:
print('Total values: ', len(tf.keras.backend.get_value(x_test)))
for test_feature, test_label in zip(x_test, y_test):
  true_label = tf.keras.backend.get_value(test_label)
  print(f'Label: {true_label}', end=" ")

  test_feature = tf.expand_dims(test_feature, 0)
  predicted = model.predict(test_feature)[0]
  
  if predicted[0] > predicted[1]:
    if true_label == 0.0:
      # True Negative
      print(predicted, "\U0000274E" + "\n")
    else:
      # False Negative
      print(predicted, "\U0000274C" + "\n")
  else:
    if true_label == 1.0:
      # True Positive
      print(predicted, "\U00002705" + "\n")
    else:
      # False Positive
      print(predicted, "\U0000274C" + "\n")

## Save trained model

In [None]:
tf.saved_model.save(model, './modelTF')

Download the model

In [None]:
!zip -r model.zip ./modelTF
!ls

In [None]:
try:
  from google.colab import files
  files.download('./model.zip')
except ImportError:
  pass