### CAMELYON17 Dataset Evaluation
This notebook covers the results using the CAMELYON17 dataset. This dataset includes image patches obtained from patients in multiple hospitals. The patches from different hospitals are used to evaluate the domain generalisation of the baseline CNN model compared to the CNN models trained with augmented data. First, the required libraries are loaded in. 

In [4]:
import numpy as np

from main_util import Model_architecture

from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras import backend
from sklearn.metrics import roc_curve, auc, RocCurveDisplay

metadata_path = "../CAMELYON17 dataset/metadata.csv"
patch_folder_path = "../CAMELYON17 dataset/patches"

The `metadata.csv` file contains all the necessary information about the patches. The columns of this file contain: `index`, `patient number`, `node number`, `x-coordinate` w.r.t. the full WSI, `y-coordinate` w.r.t. the full WSI, `label`, `slide number`, `hospital` and `split number`. The metadata is split into the training and testing set based on the `split number`, where 0 is the training set and 1 the testing set. The columns of this csv file can be used later to easily get access to the correct folders, where the patches are saved.

In [5]:
metadata = np.genfromtxt(metadata_path, dtype=int, delimiter=",", skip_header=1)
splitted_metadata = np.split(metadata, np.where(np.diff(metadata[:,8]))[0]+1)

training_metadata = splitted_metadata[0]
testing_metadata = splitted_metadata[1]
for index in range(2, len(splitted_metadata)):
    if index%2 == 0:
        training_metadata = np.vstack((training_metadata, splitted_metadata[index]))
    else:
        testing_metadata = np.vstack((testing_metadata, splitted_metadata[index]))

A function `validation` is defined to perform the evaluation on the testing set. This can easily be done for different CNN models and different hospitals by changing the input arguments.

In [6]:
def validation(testing_metadata, model_filepath):
    """
    Arguments:
    testing_metadata:   an array containing the metadata of the testing set with the following columns:
                        index, patient nr, node nr, x-coordinate, y-coordinate, label, slide nr, split nr
    model_filepath:     filepath of the CNN model used to predict the patches.
    """
    model = load_model(model_filepath)

    true_labels = []
    pred_labels = []
    for i in range(len(testing_metadata)):
        patient_nr = testing_metadata[i,1]
        node_nr = testing_metadata[i,2]
        x_coord = testing_metadata[i,3]
        y_coord = testing_metadata[i,4]
            
        if patient_nr < 10:
             patch_path = f"{patch_folder_path}/patient_00{patient_nr}_node_{node_nr}/patch_patient_00{patient_nr}_node_{node_nr}_x_{x_coord}_y_{y_coord}.png"
        else:
            patch_path = f"{patch_folder_path}/patient_0{patient_nr}_node_{node_nr}/patch_patient_0{patient_nr}_node_{node_nr}_x_{x_coord}_y_{y_coord}.png"

        img = img_to_array(load_img(patch_path, target_size=(96,96)))
        img = np.array([img])
            
        pred_label = model.predict(img/255, verbose=None)
        pred_labels.append(pred_label[0][0])

        true_label = testing_metadata[i,5]
        true_labels.append(true_label)

        backend.clear_session()
        if i%1000 == 0:
            print(f"Progress: {i}/{len(testing_metadata)}")

    return true_labels, pred_labels

In [8]:
# Defining model name and paths
model_name = "cnn_baseline"
model_filepath = f"trained_models/{model_name}.tf"

# Calling validation function
true_labels, pred_labels = validation(testing_metadata, model_filepath)

Progress: 0/45595
Progress: 1000/45595


In [None]:
# Defining model name and paths
model_name = "cnn_augmented_25"
model_filepath = f"trained_models/{model_name}.tf"

# Calling validation function
true_labels_aug_25, pred_labels_aug_25 = validation(testing_metadata, model_filepath)

In [None]:
# Defining model name and paths
model_name = "cnn_augmented_50"
model_filepath = f"trained_models/{model_name}.tf"

# Calling validation function
true_labels_aug_50, pred_labels_aug_50 = validation(testing_metadata, model_filepath)

In [None]:
# Defining model name and paths
model_name = "cnn_augmented_75"
model_filepath = f"trained_models/{model_name}.tf"

# Calling validation function
true_labels_aug_75, pred_labels_aug_75 = validation(testing_metadata, model_filepath)

In [None]:
# Defining model name and paths
model_name = "cnn_augmented_1"
model_filepath = f"trained_models/{model_name}.tf"

# Calling validation function
true_labels_aug_1, pred_labels_aug_1 = validation(testing_metadata, model_filepath)

In [None]:
# Calculating false positive rate (fpr), true positive rate (tpr) and AUC
fpr, tpr, thresholds = roc_curve(true_labels, pred_labels)
roc_auc = auc(fpr, tpr)

# Generate ROC curve
roc = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc)
roc.plot();