## Brest cancer classifier

In [6]:
import tensorflow as tf
from tensorflow.keras import mixed_precision
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization

### End-to-End CNN Strategy (1st)

#### Overview
Utilize a Convolutional Neural Network (CNN) to directly learn features from 50×50 histology image patches and classify them as benign (Class 0) or malignant (Class 1).

#### Key Steps

1. **Model Architecture**
   - **Convolutional Layers:** Automatically extract local features such as edges and textures.
   - **Pooling Layers:** Reduce spatial dimensions, making the model more robust to small translations.
   - **Fully Connected Layers:** Integrate the learned features to map them to a binary classification output.

2. **Data Augmentation**
   - **Techniques:** Apply rotations, flips, zooming, and shifts.
   - **Purpose:** Increase the effective size and variability of the dataset to reduce overfitting and improve generalization.

3. **Training with Labeled Data**
   - **Supervised Learning:** Use the provided labels with a loss function (e.g., cross-entropy) to train the network.
   - **Backpropagation:** Adjust the network weights iteratively to minimize classification errors.

4. **Optimization Techniques**
   - **Early Stopping:** Monitor validation performance to avoid overfitting.
   - **Learning Rate Scheduling:** Adapt the learning rate during training to ensure stable convergence.
   - **Dropout:** Randomly deactivate neurons during training to force the network to learn robust features.

5. **Evaluation Metrics**
   - **Metrics:** Assess performance using accuracy, precision, recall, and F1-score.
   - **Clinical Relevance:** Emphasize metrics that capture the balance between false positives and false negatives.

In [8]:
# -------------------------------
# Step 1: Ensure TensorFlow Uses GPU and Enable Mixed Precision
# -------------------------------

# Check if TensorFlow can access the GPU
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

# Enable mixed precision training for faster performance on supported GPUs
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

cores = os.cpu_count()

# -------------------------------
# Step 2: Build DataFrame from Directory Structure
# -------------------------------
data_dir = 'data/IDC_regular_ps50_idx5'  # Update this path

filepaths = []
labels = []

# Traverse directory tree
for root, dirs, files in os.walk(data_dir):
    for file in files:
        if file.lower().endswith(('.png', '.jpg', '.jpeg')):
            file_path = os.path.join(root, file)
            # Assumes the label is the name of the immediate parent folder ("0" or "1")
            label = os.path.basename(os.path.dirname(file_path))
            filepaths.append(file_path)
            labels.append(label)

# Create a DataFrame with the file paths and labels
df = pd.DataFrame({
    'filename': filepaths,
    'class': labels
})

# Split DataFrame into training and validation sets (80/20 split)
train_df, valid_df = train_test_split(df, test_size=0.2, stratify=df['class'], random_state=42)

# -------------------------------
# Step 3: Setup ImageDataGenerators with Increased Workers and Prefetch
# -------------------------------
batch_size = 32
target_size = (50, 50)
color_mode = 'rgb'  # Change to 'grayscale' if your images are grayscale

# Data augmentation for training
train_datagen = ImageDataGenerator(
    rescale=1.0/255,
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Only rescaling for validation
valid_datagen = ImageDataGenerator(rescale=1.0/255)

# Increase workers for faster data loading and enable prefetching
train_generator = train_datagen.flow_from_dataframe(
    dataframe=train_df,
    x_col='filename',
    y_col='class',
    target_size=target_size,
    batch_size=batch_size,
    class_mode='categorical',  # Uses one-hot encoding for labels
    color_mode=color_mode,
    shuffle=True,
    seed=42,
    workers= cores - 1,  # Increase the number of workers for data loading
    use_multiprocessing=True,
    prefetch=tf.data.experimental.AUTOTUNE  # Prefetch for better performance
)

validation_generator = valid_datagen.flow_from_dataframe(
    dataframe=valid_df,
    x_col='filename',
    y_col='class',
    target_size=target_size,
    batch_size=batch_size,
    class_mode='categorical',
    color_mode=color_mode,
    shuffle=False,
    workers= cores - 1,  # Increase the number of workers for data loading
    use_multiprocessing=True,
    prefetch=tf.data.experimental.AUTOTUNE  # Prefetch for better performance
)

# -------------------------------
# Step 4: Define the CNN Model
# -------------------------------
input_shape = (50, 50, 3)

model = Sequential([
    Input(shape=input_shape),
    Conv2D(32, (3, 3), activation='relu'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),

    Conv2D(64, (3, 3), activation='relu'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),

    Conv2D(128, (3, 3), activation='relu'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),

    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(2, activation='softmax')  # 2 classes: "0" and "1"
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


Num GPUs Available:  1
Found 106868 validated image filenames belonging to 2 classes.
Found 26717 validated image filenames belonging to 2 classes.


In [9]:
model.summary()

In [None]:
# -------------------------------
# Step 5: Train the Model with tf.function for Optimization
# -------------------------------
epochs = 15

# Train the model using GPU
history = model.fit(
    train_generator,
    epochs=epochs,
    validation_data=validation_generator
)