# ANN for Cancer Diagnosis (MD Anderson Context) - TensorFlow

**Objective:** Build an Artificial Neural Network (ANN) using TensorFlow to diagnose cancer (malignant vs benign) using a publicly available dataset (Kaggle), assumed to be provided to MD Anderson Cancer Institute for this exercise.

**Why ANN:** ANNs can learn complex, non-linear patterns in clinical/radiological features and improve performance as they learn from more data over time.


In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    classification_report, confusion_matrix
)

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

## Step 1 - Data Collection

**Dataset:** Kaggle “Breast Cancer Wisconsin (Diagnostic)” dataset downloaded as `data.csv`.

**Target variable:** `diagnosis`  
- `M` = Malignant (cancer)  
- `B` = Benign (non-cancer)

This notebook treats the dataset as radiology/imaging-derived diagnostic features shared for modeling, consistent with the assignment scenario.


In [2]:
df = pd.read_csv("data.csv")

print("Shape:", df.shape)
print(df.head())
print("\nColumns:\n", df.columns.tolist())
print("\nDiagnosis counts:\n", df["diagnosis"].value_counts(dropna=False))

Shape: (569, 33)
         id diagnosis  radius_mean  texture_mean  perimeter_mean  area_mean  \
0    842302         M        17.99         10.38          122.80     1001.0   
1    842517         M        20.57         17.77          132.90     1326.0   
2  84300903         M        19.69         21.25          130.00     1203.0   
3  84348301         M        11.42         20.38           77.58      386.1   
4  84358402         M        20.29         14.34          135.10     1297.0   

   smoothness_mean  compactness_mean  concavity_mean  concave points_mean  \
0          0.11840           0.27760          0.3001              0.14710   
1          0.08474           0.07864          0.0869              0.07017   
2          0.10960           0.15990          0.1974              0.12790   
3          0.14250           0.28390          0.2414              0.10520   
4          0.10030           0.13280          0.1980              0.10430   

   ...  texture_worst  perimeter_worst  area_

## Step 2 - Preprocessing (Cleaning + Scaling)

**What I did and why it matters:**
- Dropped non-informative identifier columns (e.g., `id`) and empty columns (if present) so the model focuses on meaningful features.
- Encoded the target (`diagnosis`) into numeric values for supervised learning.
- Used a stratified train/test split to preserve the malignant/benign ratio.
- Standardized features (fit on training set only) to prevent data leakage and improve ANN training stability.


In [3]:
# Drop common non-feature columns (safe even if they don't exist)
df = df.drop(columns=[c for c in ["id", "Unnamed: 32"] if c in df.columns], errors="ignore")

# Encode diagnosis
df["diagnosis"] = df["diagnosis"].map({"M": 1, "B": 0})

# Check and handle missing values
print("Total missing values:", df.isna().sum().sum())
df = df.dropna()

print("Shape after cleaning:", df.shape)
print(df["diagnosis"].value_counts())

Total missing values: 0
Shape after cleaning: (569, 31)
diagnosis
0    357
1    212
Name: count, dtype: int64


In [4]:
X = df.drop(columns=["diagnosis"])
y = df["diagnosis"].astype(int)

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.20,
    random_state=42,
    stratify=y
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Train shape:", X_train_scaled.shape)
print("Test shape:", X_test_scaled.shape)

Train shape: (455, 30)
Test shape: (114, 30)


## Step 3 - Model Building (ANN Design)

**Design rationale:**
- Two hidden layers (ReLU) to learn non-linear relationships in diagnostic features.
- Dropout regularization to reduce overfitting.
- Sigmoid output to produce a malignancy probability (binary classification).

**Training setup choices:**
- Loss: Binary crossentropy (appropriate for binary labels with sigmoid)
- Optimizer: Adam (stable default)
- Metric: Accuracy (with additional medical-focused metrics computed later)


In [5]:
tf.random.set_seed(42)

model = Sequential([
    Dense(32, activation="relu", input_shape=(X_train_scaled.shape[1],)),
    Dropout(0.30),
    Dense(16, activation="relu"),
    Dropout(0.20),
    Dense(1, activation="sigmoid")
])

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Step 4 - Training

**Monitoring and reliability:**
- Used EarlyStopping on validation loss to prevent over-training and overfitting.
- Restored best weights so the final model reflects the best validation performance.


In [6]:
early_stop = EarlyStopping(
    monitor="val_loss",
    patience=8,
    restore_best_weights=True
)

history = model.fit(
    X_train_scaled, y_train,
    validation_split=0.20,
    epochs=60,
    batch_size=16,
    callbacks=[early_stop],
    verbose=1
)

Epoch 1/60
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 24ms/step - accuracy: 0.6486 - loss: 0.6219 - val_accuracy: 0.8462 - val_loss: 0.4437
Epoch 2/60
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.7860 - loss: 0.4519 - val_accuracy: 0.9121 - val_loss: 0.3332
Epoch 3/60
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.8763 - loss: 0.3657 - val_accuracy: 0.9341 - val_loss: 0.2629
Epoch 4/60
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9044 - loss: 0.2777 - val_accuracy: 0.9451 - val_loss: 0.2159
Epoch 5/60
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.9215 - loss: 0.2331 - val_accuracy: 0.9451 - val_loss: 0.1854
Epoch 6/60
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9486 - loss: 0.1919 - val_accuracy: 0.9451 - val_loss: 0.1657
Epoch 7/60
[1m23/23[0m [32m━━━━━━━

## Step 5 - Evaluation

In cancer diagnosis, accuracy alone is not enough.  
**Recall (sensitivity)** is especially important because a false negative means a malignant case was missed.

Metrics reported:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion matrix


In [7]:
y_prob = model.predict(X_test_scaled).ravel()
y_pred = (y_prob >= 0.5).astype(int)

acc = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)
rec = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy :", round(acc, 4))
print("Precision:", round(prec, 4))
print("Recall   :", round(rec, 4))
print("F1-score :", round(f1, 4))

print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

print("\nConfusion Matrix:\n")
print(confusion_matrix(y_test, y_pred))

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
Accuracy : 0.9825
Precision: 1.0
Recall   : 0.9524
F1-score : 0.9756

Classification Report:

              precision    recall  f1-score   support

           0       0.97      1.00      0.99        72
           1       1.00      0.95      0.98        42

    accuracy                           0.98       114
   macro avg       0.99      0.98      0.98       114
weighted avg       0.98      0.98      0.98       114


Confusion Matrix:

[[72  0]
 [ 2 40]]


## Step 6 - Improvement (Threshold Tuning)

**Why threshold tuning:**
In clinical screening/triage settings, missing malignant cases can be more costly than flagging benign cases for follow-up.

Instead of using the default 0.50 cutoff, I tested multiple thresholds to see how recall and precision change.


In [8]:
thresholds = [0.30, 0.35, 0.40, 0.45, 0.50, 0.55]
rows = []

for t in thresholds:
    preds = (y_prob >= t).astype(int)
    rows.append({
        "threshold": t,
        "accuracy": accuracy_score(y_test, preds),
        "precision": precision_score(y_test, preds),
        "recall": recall_score(y_test, preds),
        "f1": f1_score(y_test, preds)
    })

results_df = pd.DataFrame(rows).sort_values(by=["recall", "f1"], ascending=False)
results_df

Unnamed: 0,threshold,accuracy,precision,recall,f1
0,0.3,0.991228,1.0,0.97619,0.987952
1,0.35,0.991228,1.0,0.97619,0.987952
2,0.4,0.991228,1.0,0.97619,0.987952
3,0.45,0.991228,1.0,0.97619,0.987952
4,0.5,0.982456,1.0,0.952381,0.97561
5,0.55,0.973684,1.0,0.928571,0.962963


## Step 7 - Documentation Summary (Challenges, Fixes, Insights)

**Challenges faced:**
- Risk of overfitting due to learning noise in tabular clinical features.
- Need to balance false negatives (missed malignancy) vs false positives (unnecessary follow-up).

**How I addressed them:**
- Added dropout and early stopping to improve generalization.
- Used threshold tuning to explore safer clinical tradeoffs that increase recall.

**Practical application at MD Anderson:**
This model is best positioned as a **decision-support tool** that helps radiologists prioritize cases and identify patterns, not as a standalone diagnostic replacement.


In [9]:
assert set(np.unique(y_pred)).issubset({0, 1})
assert len(y_pred) == len(y_test)
assert not np.isnan(acc)

"All unit checks passed."


'All unit checks passed.'