# Breast cancer prediction network with TensorFlow

To further understand how to use TensorFlow, let's apply it to the breast cancer dataset we used in our earlier deep learning tutorial. First we read and normalize the data:

In [1]:
import numpy as np
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
y = np.matrix(data.target).T
X = np.matrix(data.data)
M = X.shape[0]
N = X.shape[1]

# Normalize each input feature

def normalize(X):
    M = X.shape[0]
    XX = X - np.tile(np.mean(X,0),[M,1])
    XX = np.divide(XX, np.tile(np.std(XX,0),[M,1]))
    return XX

XX = normalize(X)

Next, let's build the DNN classifier, train it for 1000 mini-batches of size 20, and show the accuracy on the training set:

In [3]:
import tensorflow as tf

# Build classifier

N = XX.shape[1]
my_feature_columns = [tf.feature_column.numeric_column(key=("F%d"%i)) for i in range(N)]

classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[6, 5],
    # The model must choose between 3 classes.
    n_classes=2)

# Train the model

batch_size = 20
train_steps = 1000

def make_dataset(X,y):
    # Make a dictionary mapping feature index to the values for each feature
    dictionary = { "F%d"%i: XX[:,i] for i in range(N) }
    # Convert pairs from the dictionary and the label vector y to a dataset
    dataset = tf.data.Dataset.from_tensor_slices((dictionary, y))
    return dataset
    
def input_fn_train():
    dataset = make_dataset(XX,y)
    # Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
    # Build the Iterator, and return the read end of the pipeline.
    return dataset.make_one_shot_iterator().get_next()

classifier.train(input_fn=input_fn_train, steps=train_steps)

# Evaluate the model (on the training set!)

def input_fn_eval():
    dataset = make_dataset(XX,y)
    return dataset.batch(1)

eval_result = classifier.evaluate(input_fn=input_fn_eval)

print('\nTrain set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_service': None, '_task_type': 'worker', '_master': '', '_task_id': 0, '_save_checkpoints_secs': 600, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_evaluation_master': '', '_session_config': None, '_model_dir': '/tmp/tmpzd76scpm', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f08722f8b00>, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmpzd76scpm/model.ckpt.
INFO:tensorflow:step = 1, loss = 12.363866
INFO:tensorf

You should get a training set accuracy of 99% or so.

The next step would be to split the data into training and validation sets to find a good set of parameters.