# Histopathologic Cancer Detection using ResNet50
This notebook implements a deep learning pipeline to identify metastatic cancer in small image patches taken from larger digital pathology scans. We use **Transfer Learning** with a ResNet50 backbone, followed by fine-tuning.

### 1. Import Libraries
We use TensorFlow/Keras for modeling and Scikit-Learn for data splitting and evaluation metrics.

In [None]:
import cv2
import matplotlib.pyplot as plt
import os
import pandas as pd
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, BatchNormalization
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from sklearn.metrics import precision_score, recall_score, precision_recall_curve

### 2. Data Loading and Preprocessing
We load the CSV labels and create absolute paths for the images. The data is split into 80% training and 20% validation sets, stratified by the label to maintain class balance.

In [None]:
labels_df = pd.read_csv('train_labels.csv')
image_dir = 'train/'

# Create image path column
labels_df['image_path'] = labels_df['id'].apply(lambda x: os.path.join(image_dir, f"{x}.tif"))

# Train-Validation Split (80/20 Stratified)
train_df, val_df = train_test_split(labels_df, test_size=0.2, random_state=42, stratify=labels_df['label'])

### 3. Data Augmentation
To prevent overfitting and improve generalization, we apply random transformations like rotation, zoom, and horizontal flips. We use the `preprocess_input` function specific to ResNet50.

In [None]:
from tensorflow.keras.applications.resnet50 import preprocess_input

train_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    rotation_range=30,
    width_shift_range=0.3,
    height_shift_range=0.3,
    shear_range=0.3,
    zoom_range=0.3,
    horizontal_flip=True,
    brightness_range=[0.8, 1.2],
    fill_mode='nearest'
)

val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

train_generator = train_datagen.flow_from_dataframe(
    dataframe=train_df,
    directory=None, 
    x_col="image_path",
    y_col="label",
    target_size=(96, 96),
    batch_size=32,
    class_mode='raw'
)

val_generator = val_datagen.flow_from_dataframe(
    dataframe=val_df,
    directory=None,
    x_col="image_path",
    y_col="label",
    target_size=(96, 96),
    batch_size=32,
    class_mode='raw'
)

### 4. Model Architecture (Transfer Learning)
We load ResNet50 pre-trained on ImageNet. Initially, we freeze all ResNet layers and only train the custom head we added on top.

In [None]:
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(96, 96, 3))
base_model.trainable = False  # Freeze base layers initially

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(1, activation='sigmoid')(x)

model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

print("Starting Initial Training...")
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=10,
    verbose=1
)

### 5. Fine-Tuning
Now we unfreeze the later stages of the ResNet50 model (Stage 5) to allow the model to learn features specific to histopathologic images. We use a very low learning rate to avoid destroying the pre-trained weights.

In [None]:
print("\nUnfreezing top layers for Fine-Tuning...")

# Access layer before original head and rebuild slightly for better stability
x = model.layers[-3].output 
x = Dropout(0.4)(x)
x = BatchNormalization()(x)
new_predictions = Dense(1, activation='sigmoid')(x)

# Unfreeze the last block of ResNet50
for layer in base_model.layers[:140]:
    layer.trainable = False
for layer in base_model.layers[140:]:
    layer.trainable = True

new_model = Model(inputs=model.input, outputs=new_predictions)
new_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5), 
                  loss='binary_crossentropy', 
                  metrics=['accuracy'])

# Callbacks for optimized training
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_accuracy', factor=0.5, patience=3, min_lr=1e-7)

history_finetune = new_model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=20,
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

### 6. Final Evaluation
We visualize the training history and calculate the Precision and Recall on the validation set.

In [None]:
# Accuracy/Loss Plots
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history_finetune.history['accuracy'], label='Train Accuracy')
plt.plot(history_finetune.history['val_accuracy'], label='Val Accuracy')
plt.title('Model Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history_finetune.history['loss'], label='Train Loss')
plt.plot(history_finetune.history['val_loss'], label='Val Loss')
plt.title('Model Loss')
plt.legend()
plt.show()

# Precision/Recall
val_generator.reset()
y_pred = new_model.predict(val_generator)
y_true = val_df['label'].values

precision = precision_score(y_true, (y_pred > 0.5).astype(int))
recall = recall_score(y_true, (y_pred > 0.5).astype(int))

print(f'\nFinal Validation Precision: {precision:.4f}')
print(f'Final Validation Recall: {recall:.4f}')