

# AAI612: Deep Learning & its Applications


*Notebook 3.5: Detecting Breast Cancer*

<a href="https://colab.research.google.com/github/harmanani/AAI612/blob/main/Week3/Notebook3.5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Breast Cancer Detection

Breast cancer is the most common malignancy among women, accounting for nearly 1 in 3 cancers diagnosed among women in the United States, and it is the second leading cause of cancer death among women. Breast Cancer occurs as a results of abnormal growth of cells in the breast tissue, commonly referred to as a Tumor. A tumor does not mean cancer - tumors can be benign (not cancerous), pre-malignant (pre-cancerous), or malignant (cancerous). Tests such as MRI, mammogram, ultrasound and biopsy are commonly used to diagnose breast cancer performed.

## Dataset

This is an analysis of the Breast Cancer Wisconsin (Diagnostic) [DataSet](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29). This data set was created by Dr. William H. Wolberg, physician at the University Of Wisconsin Hospital at Madison, Wisconsin,USA. To create the dataset Dr. Wolberg used fluid samples, taken from patients with solid breast masses and an easy-to-use graphical computer program called Xcyt, which is capable of perform the analysis of cytological features based on a digital scan. The program uses a curve-fitting algorithm, to compute ten features from each one of the cells in the sample, than it calculates the mean value, extreme value and standard error of each feature for the image, returning a 30 real-valuated vector

Attribute Information:

- ID number 
- Diagnosis (M = malignant, B = benign) 3-32

Ten real-valued features are computed for each cell nucleus:

1) radius (mean of distances from center to points on the perimeter) 
2) texture (standard deviation of gray-scale values) 
3) perimeter 
4) area 
5) smoothness (local variation in radius lengths) 
6) compactness (perimeter^2 / area - 1.0) 
7) concavity (severity of concave portions of the contour) 
8) concave points (number of concave portions of the contour) 
9) symmetry 
10) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.  There are no missing attribute values.  The class distribution is 357 benign and 212 malignant.

## The Problem

The objective is to classify whether the breast cancer is benign or malignant.  Let us start by importing the dataset:

In [None]:
# Importing libraries
import pandas as pd
import numpy as np
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

In [None]:
data = pd.read_csv('https://raw.githubusercontent.com/harmanani/AAI612/main/Week3/breast_cancer/data.csv')
data.head(10)

In [None]:
# Deelete the last column!
del data['Unnamed: 32']

## The Solution: Deep Learning

Read features and label

In [None]:
#Skip the first two columns: The ID and the diagnosis
X = data.iloc[:, 2:].values

# Now, read the diagnosis
y = data.iloc[:, 1].values

Encoding categorical data

In [None]:
from sklearn.preprocessing import LabelEncoder

labelencoder_X_1 = LabelEncoder()
y = labelencoder_X_1.fit_transform(y)

Splitting the dataset into the Training set and Test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Exploring the Dataset

In [None]:
data.describe()

In [None]:
data.dtypes

In [None]:
X_train.shape

In [None]:
X_valid.shape
y_valid.shape

Furthermore, we can see that these 28x28 images are represented as a collection of unsigned 8-bit integer values between 0 and 255, the values corresponding with a pixel's grayscale value where `0` is black, `255` is white, and all other values are in between:

In [None]:
X_train.dtype

In [None]:
X_train.min()

In [None]:
X_train.max()

In [None]:
X_train[0]

In [None]:
import seaborn as sns
ax = sns.countplot(x=data["diagnosis"], width=0.5)

### Scaling!

Let us scale the features!

In [None]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_valid = sc.transform(X_valid)

And now notice the difference!

In [None]:
X_train[0]

## Creating the Model

With the data prepared for training, it is now time to create the model that we will train with the data. This first basic model will be made up of several *layers* and will be comprised of 3 main parts:

1. An input layer, which will receive data in some expected format
2. Several [hidden layers](https://developers.google.com/machine-learning/glossary#hidden-layer), each comprised of many *neurons*. Each [neuron](https://developers.google.com/machine-learning/glossary#neuron) will have the ability to affect the network's guess with its *weights*, which are values that will be updated over many iterations as the network gets feedback on its performance and learns
3. An output layer, which will output the prediction

### Instantiating the Model

To begin, we will use Keras's [Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model class to instantiate an instance of a model that will have a series of layers that data will pass through in sequence:

In [None]:
from tensorflow.keras.models import Sequential

model = Sequential()

### Creating the Input Layer

Next, we will add the input layer. This layer will be *densely connected*, meaning that each neuron in it, and its weights, will affect every neuron in the next layer. To do this with Keras, we use Keras's [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer class.

In [None]:
from tensorflow.keras.layers import Dense, Dropout

We will learn more about activation functions later, but for now, we will use the `relu` activation function, which in short, will help our network to learn how to make more sophisticated guesses about data than if it were required to make guesses based on some strictly linear function.

In [None]:
model.add(Dense(units=30, activation='relu', input_shape=(30,)))

Adding dropout to prevent overfitting.  More about this one later :-)

In [None]:
model.add(Dropout(rate=0.1))

### Creating the Hidden Layer

Now we will add an additional densely connected layer. Again, much more will be said about these later, but for now know that these layers give the network more parameters to contribute towards its guesses, and therefore, more subtle opportunities for accurate learning:

In [None]:
model.add(Dense(units = 16, activation='relu'))
model.add(Dropout(rate=0.1))

### Creating the Output Layer

Finally, we will add an output layer. This layer uses the activation function `sigmoid` which will result with an output probability between 0 and 1.  The reason is that this is a classification problem and we the prediction is the highest value:

In [None]:
model.add(Dense(units = 1, activation='sigmoid'))

### Summarizing the Model

Keras provides the model instance method [summary](https://www.tensorflow.org/api_docs/python/tf/summary) which will print a readable summary of a model:

In [None]:
model.summary()

Note the number of trainable parameters. Each of these can be adjusted during training and will contribute towards the trained model's guesses.

### Compiling the Model

Again, more details are to follow, but the final step we need to do before we can actually train our model with data is to [compile](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#compile) it. Here we specify a [loss function](https://developers.google.com/machine-learning/glossary#loss) which will be used for the model to understand how well it is performing during training. We also specify that we would like to track `accuracy` while the model trains:

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

## Training the Model

Now that we have prepared training and validation data, and a model, it's time to train our model with our training data, and verify it with its validation data.

"Training a model with data" is often also called "fitting a model to data." Put this latter way, it highlights that the shape of the model changes over time to more accurately understand the data that it is being given.

When fitting (training) a model with Keras, we use the model's [fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) method. It expects the following arguments:

* The training data
* The labels for the training data
* The number of times it should train on the entire training dataset (called an *epoch*)
* The validation or test data, and its labels

Run the cell below to train the model. We will discuss its output after the training completes:

In [None]:
history = model.fit(
    X_train, y_train, epochs=150, verbose=1, validation_data=(X_valid, y_valid)
)

In [None]:
# Predicting the Test set results
y_pred = model.predict(X_valid)
y_pred = (y_pred > 0.5)

In [None]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_valid, y_pred)

In [None]:
print("Our accuracy is {}%".format(((cm[0][0] + cm[1][1])/114)*100))

### Confusion Matrix

A confusion matrix represents the prediction summary in matrix form. It shows how many prediction are correct and incorrect per class. It helps in understanding the classes that are being confused by model as other class.  So, (1, 1) and (0,0) are properly predicted, benign or malignant.

<img src="images/3-s2.0-B9780323911979000138-f14-09-9780323911979.jpg"/>

In [None]:
import matplotlib.pyplot as plt

sns.heatmap(cm,annot=True)
plt.savefig('h.png')

In [None]:
chart_x = range(1,151)

In [None]:
chart_y_train = history.history['loss']
chart_y_test = history.history['val_loss']
print(history.history.keys())

In [None]:
import matplotlib.pyplot as plt

def plot_learning():
    plt.plot(chart_x, chart_y_train, 'r-',label='training error')
    plt.plot(chart_x, chart_y_test, 'b-',
    label='test error')
    plt.xlabel('training epochs')
    plt.ylabel('error')
    plt.legend()
    plt.show()
plot_learning()

### Observing Accuracy

For each of the 5 epochs, notice the `accuracy` and `val_accuracy` scores. `accuracy` states how well the model did for the epoch on all the training data. `val_accuracy` states how well the model did on the validation data, which if you recall, was not used at all for training the model.

The model did quite well! The accuracy quickly reached close to 95%, as did the validation accuracy. We now have a model that can be used to accurately detect and classify hand-written images.

The next step would be to use this model to classify new not-yet-seen handwritten images. This is called [inference](https://blogs.nvidia.com/blog/2016/08/22/difference-deep-learning-training-inference-ai/). We'll explore the process of inference in a later exercise. 