# Melanoma Classification

Kaggle Competition Page: www.kaggle.com/c/siim-isic-melanoma-classification/overview


## What is Melanoma?
Melanoma, the most severe type of skin cancer, develops in the cells (melanocytes) that produce melanin — the pigment that gives your skin its color. Melanoma can also form in your eyes and, rarely, inside your body, such as in your nose or throat.

The exact cause of all melanomas isn't clear, but exposure to ultraviolet (UV) radiation from sunlight or tanning lamps and beds increases your risk of developing melanoma.

The risk of melanoma seems to be increasing in people under 40, especially women. Knowing the warning signs of skin cancer can help ensure that cancerous changes are detected and treated before the cancer has spread. We can treat melanoma successfully if it is detected early.

<img src="https://github.com/SaschaMet/melanoma-classification/blob/master/images/melanoma.jpg?raw=1" alt="Drawing" style="width: 600px;"/>

## Symptoms & Diagnosis
Melanomas can develop anywhere on your body. They most often develop in areas with exposure to the sun, such as your back, legs, arms, and face.
Melanomas can also occur in areas that don't receive much sun exposure, such as the soles of your feet, palms of your hands, and fingernail beds. These hidden melanomas are more common in people with darker skin.

To help you identify characteristics of melanomas or other skin cancers, think of the letters ABCDE:
- A is for asymmetrical shape. Look for moles with irregular shapes, such as two very different-looking halves.
- B is for irregular border. Look for moles with rough, notched, or scalloped edges — characteristics of melanomas.
- C is for color changes. Look for growths that have many colors or an uneven distribution of color.
- D is for diameter. Look for new growth in a mole larger than 1/4 inch (about 6 millimeters).
- E is for evolving. Look for changes over time, such as a mole that grows in size or changes color or shape.


![ABCDE Melanoma](https://github.com/SaschaMet/melanoma-classification/blob/master/images/abcde-melanoma.jpg?raw=1)

Source: https://www.health.harvard.edu/cancer/melanoma-overview

The facts about Melanoma:
- Melanoma is the most severe form of skin cancer
- It makes up 2% of skin cancers but is responsible for 75% of skin cancer deaths
- Australia and New Zealand have the highest melanoma rates in the world
- 1 in 17 Australians will be diagnosed with melanoma before the age of 85
- More than 90% of melanoma can be successfully treated with surgery if detected early

Source: https://melanomapatients.org.au/about-melanoma/melanoma-facts/

<img src="https://github.com/SaschaMet/melanoma-classification/blob/master/images/melanoma-impact.jpg?raw=1" alt="Drawing" style="width: 600px;"/>

Source: https://impactmelanoma.org/wp-content/uploads/2018/11/Standard-Infographic_0.jpg

## Setup

In [None]:
import os
import glob
import shutil

files = glob.glob('/kaggle/working/*')
for f in files:
    try:
        os.remove(f)
    except:
        shutil.rmtree(f)

In [None]:
import os
import random
from pathlib import Path
import pandas as pd
import tensorflow as tf
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
import cv2
import matplotlib.pyplot as plt
from keras.applications.vgg16 import VGG16
from keras.applications.densenet import DenseNet121
from keras.applications.nasnet import NASNetMobile, NASNetLarge
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras import layers
from datetime import datetime, date
from keras.callbacks import ModelCheckpoint, EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn.utils import class_weight
from keras.optimizers import Adam
import json
from sklearn.metrics import roc_curve, auc, precision_recall_curve, plot_precision_recall_curve, confusion_matrix
from tensorflow import keras
import itertools
from tqdm import tqdm

<div class="alert alert-block alert-info">
<b>Tip:</b> Use blue boxes (alert-info) for tips and notes. 
If it’s a note, you don’t have to include the word “Note”.
</div>

In [None]:
SEED = 1
PIXELS_SIZE = 224
IMG_SIZE = (PIXELS_SIZE, PIXELS_SIZE)
INPUT_SHAPE = (PIXELS_SIZE, PIXELS_SIZE, 3)
OUTPUT_NEURONS = 1
VERBOSE_LEVEL = 2
SAVE_OUTPUT = True
CLASS_MODE = "binary"
# "raw" evitarlo y "categorical" debería de ser igual pero al ser onehot pues también evitarlo en principio
IS_CLASS_MODE_BINARY = CLASS_MODE == "binary"
POSITIVE_CLASS = "1" if IS_CLASS_MODE_BINARY else 1
NEGATIVE_CLASS = "0" if IS_CLASS_MODE_BINARY else 0

FAST_RUN = True

DO_UNDERSAMPLING = True
PREPRO_ROTATION = True
PREPRO_BLUR = True
PREPRO_BRIGHTNESS = True
PREPRO_ZOOM = True
PREPRO_BRIGHTNESS_LOW = 0.75 if PREPRO_BRIGHTNESS else 1
PREPRO_ZOOM_LOW = 0.75 if PREPRO_ZOOM else 1

BATCH_SIZE = 64 if DO_UNDERSAMPLING else 256
EPOCHS = 40 if DO_UNDERSAMPLING else 100

BASE_MODEL_TRAINABLE = False

base_model = NASNetMobile(
    input_shape=INPUT_SHAPE,
    include_top=False,
    weights='imagenet'
)

LEARNING_RATE = 1e-4 if not BASE_MODEL_TRAINABLE else 1e-5 #Probar también con 1e-5 si acaso
OPTIMIZER = Adam(lr=LEARNING_RATE) #Dejar esto que es el que mejor funcionar
LOSS = 'binary_crossentropy'
METRICS = [
    'accuracy', 
    'AUC'
] 

timestamp = str(date.today()) + "_" + str(datetime.now().strftime("%H:%M:%S"))

experiment_id = "NASNetMobile_"+ ("Under_" if DO_UNDERSAMPLING else "CW_") + "Pre"+ ("Rot" if PREPRO_ROTATION else "")+ ("Blur" if PREPRO_BLUR else "")
experiment_id = experiment_id + ("Bright" if PREPRO_BRIGHTNESS else "") + ("Zoom" if PREPRO_ZOOM else "")
experiment_id = experiment_id +"_"+ str(BATCH_SIZE) + "B_" + str(EPOCHS) + "E_" + "LR" + str(LEARNING_RATE)
experiment_id = experiment_id +"_" + ("FineTuning" if BASE_MODEL_TRAINABLE else "Extractor")
base_output = "./" +experiment_id+"/"
base_output_path = base_output + experiment_id + "-"
try:
    os.makedirs(experiment_id)
    open("./" +experiment_id+"/" + timestamp, 'w').close()
except FileExistsError:
    pass


BASE_PATH = '/kaggle/input/tfmmelanomapreprocessed'
PATH_TO_IMAGES = '/kaggle/input/tfmmelanomapreprocessed/dataset/jpeg'+str(PIXELS_SIZE)
IMAGE_TYPE = ".jpg"

# warnings.filterwarnings('ignore')
    
# Tensorflow execution optimizations
# Source: https://www.tensorflow.org/guide/mixed_precision & https://www.tensorflow.org/xla
MIXED_PRECISION = True
XLA_ACCELERATE = True
GPUS = 0

GPUS = len(tf.config.experimental.list_physical_devices('GPU'))
if GPUS == 0:
    DEVICE = 'CPU'
    raise RuntimeError('Running on CPU')
else:
    DEVICE = 'GPU'
    if MIXED_PRECISION:
        from tensorflow.keras.mixed_precision import experimental as mixed_precision
        policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
        mixed_precision.set_policy(policy)
        print('Mixed precision enabled')
    if XLA_ACCELERATE:
        tf.config.optimizer.set_jit(True)
        print('Accelerated Linear Algebra enabled')

print("Tensorflow version " + tf.__version__)


print("Set seeds")
random.seed(SEED)
np.random.seed(SEED)
os.environ['PYTHONHASHSEED'] = str(SEED)
os.environ['TF_KERAS'] = str(SEED)
os.environ['TF_DETERMINISTIC_OPS'] = str(SEED)
os.environ['TF_CUDNN_DETERMINISTIC'] = str(SEED)
tf.random.set_seed(SEED)

## Loading the data (csv files)

In [None]:
def check_image(fileName, folder):
    absolutePath = PATH_TO_IMAGES + folder + fileName + IMAGE_TYPE
    img_file = Path(absolutePath)
    if img_file.is_file():
        return absolutePath
    return False

def get_train_data():
    print("Loading train data")
    train = pd.read_csv(os.path.join(BASE_PATH, 'train.csv'))
    train['image_path'] = train['image_name'].apply(lambda x: check_image(x, "/train/"))
    train = train[train['image_path'] != False]
    print("valid rows in train", train.shape[0])
    return train

def get_test_data():
    print("Loading test data")
    test = pd.read_csv(os.path.join(BASE_PATH, 'test.csv'))
    test['image_path'] = test['image_name'].apply(lambda x: check_image(x, "/test/"))
    test = test[test['image_path'] != False]
    print("valid rows in test", test.shape[0])
    return test

train = train_backup.copy() if "train_backup" in globals() else get_train_data()
test = test_backup.copy() if "test_backup" in globals() else get_test_data()
train_backup = train.copy()
test_backup = test.copy()

print(train.dtypes)
print(test.dtypes)

Train Dataset:
- image name: the filename for the specific image
- patient_id: unique patient id
- sex: gender of the patient
- age_approx: age of the patient
- anatom_site_general_challenge: location of the scan site
- diagnosis: information about the diagnosis
- benign_malignant: indicates if the scan result is malignant or benign
- target: 0 for benign and 1 for malignant
- image_path: path to the image

Test Dataset Consists Of:
- image name: the filename for the specific image
- patient_id: unique patient id
- sex: gender of the patient
- age_approx: age of the patient
- anatom_site_general_challenge: location of the scan site
- image_path: path to the image

## Data preparation

In [None]:
if type(train["target"].iloc[0]) is not np.int64:
    raise RuntimeError('Train backup not loaded properly')
if(IS_CLASS_MODE_BINARY):
    train['target'] = train['target'].apply(str)
    if type(train["target"].iloc[0]) is not str:
        raise RuntimeError('Train backup not loaded properly')

In [None]:
# reduce amount of data when running on a cpu
if DEVICE == 'CPU':
    raise RuntimeError('Running on CPU')
    print("rows in train", train.shape[0])
    print("rows in test", test.shape[0])
    print("reduce the amount of data because of cpu a runtime")
    # take 30% of the available data
    train = train.sample(int(train.shape[0] * 0.3))
    EPOCHS = 5
    SAVE_OUTPUT = False
    print("rows in train", train.shape[0])
    print("rows in test", test.shape[0])



### Balance the dataset /UNDERSAMPLING

Because we have a highly imbalanced dataset we need to balance it.

In [None]:
if DO_UNDERSAMPLING:

    print(train[train.target == POSITIVE_CLASS].shape, "positive in train")
    print(train[train.target == NEGATIVE_CLASS].shape, "negative in train")
    # 1 means 50 / 50 => equal amount of positive and negative cases in Training
    # 4 = 20%; 8 = ~11%; 12 = ~8%
    balance = 1
    p_inds = train[train.target == POSITIVE_CLASS].index.tolist()
    np_inds = train[train.target == NEGATIVE_CLASS].index.tolist()

    np_sample = random.sample(np_inds, balance * len(p_inds))
    train = train.loc[p_inds + np_sample]
    # print("Samples in train", train['target'].sum()/len(train))
    # print("Remaining rows in train set", len(train))
else:
    print("No undersampling")

print(train[train.target == POSITIVE_CLASS].shape, "positive in train")
print(train[train.target == NEGATIVE_CLASS].shape, "negative in train")

### Patient Overlap

Important to note is that there are patients with multiple images taken in both train and test datasets.
We, therefore, need to check that the same patient images do not appear in the training and test set.


In [None]:
print("Max number of images from one patient in the train set:", np.max(train.patient_id.value_counts()))
print("Max number of images from one patient in the test set:", np.max(test.patient_id.value_counts()))

# get the unique patient ids from the test and training set
ids_train = set(train.patient_id.values)
ids_test = set(test.patient_id.values)

print("There are", len(ids_train), "unique patients in the training set")
print("There are", len(ids_test), "unique patients in the test set")

# Identify patient overlap by looking at the intersection between the sets
patient_overlap = list(ids_train.intersection(ids_test))
n_overlap = len(patient_overlap)
print("There are", n_overlap, "patients in both the training and test sets")

In [None]:
""" Helper function to create a train and a validation dataset

    Parameters:
    df (dataframe): The dataframe to split
    test_size (int): Size of the validation set
    classToPredict: The target column

    Returns:
    train_data (dataframe)
    val_data (dataframe)
"""
def create_splits(df, test_size, classToPredict):
    train_data, val_data = train_test_split(df,  test_size = test_size, random_state = SEED, stratify = df[classToPredict])
    return train_data, val_data

""" Helper function to plot the history of a tensorflow model

    Parameters:
        history (history object): The history from a tf model
        timestamp (string): The timestamp of the function execution

    Returns:
        Null
"""
def save_history(history, timestamp):
    f = plt.figure()
    f.set_figwidth(15)

    f.add_subplot(1, 2, 1)
    plt.plot(history['val_loss'], label='val loss')
    plt.plot(history['loss'], label='train loss')
    plt.legend()
    plt.title("Modell Loss")

    f.add_subplot(1, 2, 2)
    plt.plot(history['val_accuracy'], label='val accuracy')
    plt.plot(history['accuracy'], label='train accuracy')
    plt.legend()
    plt.title("Modell Accuracy")

    if SAVE_OUTPUT:
        length = len(history["loss"])-1
        metrics = ["loss", "accuracy","auc","val_loss", "val_accuracy","val_auc"]
        f = open(base_output_path + "2finalResults.txt", "a")
        for metric in metrics:
            metricValue = round(history[metric][length],4)
            f.write(metric + ":" + str(metricValue) + ("\n\n" if metric == "auc" else "\n"))
        f.close()
        plt.savefig(base_output_path + "2history.png")
        with open(base_output_path + "2history.json", 'w') as f:
            json.dump(history, f)
            
            
""" Helper function to plot the auc curve

    Parameters:
        t_y (array): True binary labels
        p_y (array): Target scores

    Returns:
        Null
"""
def plot_auc(t_y, p_y):
    fpr, tpr, thresholds = roc_curve(t_y, p_y, pos_label=1)
    fig, c_ax = plt.subplots(1,1, figsize = (8, 8))
    c_ax.plot(fpr, tpr, label = '%s (AUC:%0.2f)'  % ('Target', auc(fpr, tpr)))
    c_ax.plot([0, 1], [0, 1], color='navy', lw=1, linestyle='--')
    c_ax.legend()
    c_ax.set_xlabel('False Positive Rate')
    c_ax.set_ylabel('True Positive Rate')
    plt.savefig(base_output_path + "5auc.png")

## Data augmentation

In [None]:
def customPreprocess(image):
    if (PREPRO_ROTATION):
        image = np.rot90(image, np.random.choice([-1, 0, 1, 2]))
    if (PREPRO_BLUR and (bool(random.getrandbits(1)))):
        image = cv2.blur(image,(3,3))
    return image


 
# datagen = ImageDataGenerator(preprocessing_function= blur)
def get_training_gen(df):
    #chanel shift nos lo quitamos y probar a reducr para acortar tiempo de entrenamiento.
    train_idg = ImageDataGenerator(
        rescale = 1 / 255.0,
        horizontal_flip = True,
        vertical_flip = True,
        brightness_range = [PREPRO_BRIGHTNESS_LOW,1],
        zoom_range = [PREPRO_ZOOM_LOW,1],
        fill_mode='nearest',
        preprocessing_function=customPreprocess
    )

    train_gen = train_idg.flow_from_dataframe(
        seed=SEED,
        dataframe=df,
        directory=None,
        x_col='image_path',
        y_col='target',
        class_mode=CLASS_MODE,
        shuffle=True,
        target_size=IMG_SIZE,
        batch_size=BATCH_SIZE,
        validate_filenames = False
    )

    return train_gen

""" Factory function to create a validation image data generator

Parameters:
    df (dataframe): Validation dataframe 

Returns:
    Image Data Generator function
"""
def get_validation_gen(df):
    ## prepare images for validation
    val_idg = ImageDataGenerator(rescale=1. / 255.0)
    val_gen = val_idg.flow_from_dataframe(
        seed=SEED,
        dataframe=df,
        directory=None,
        x_col='image_path',
        y_col='target',
        class_mode=CLASS_MODE,
        shuffle=False,
        target_size=IMG_SIZE,
        batch_size=BATCH_SIZE,
        validate_filenames = False
    )

    return val_gen

### Images returned from the ImageDataGenerator

In [None]:
if SAVE_OUTPUT:
    train_gen = get_training_gen(train)
    t_x, t_y = next(train_gen)
    fig, m_axs = plt.subplots(4, 4, figsize = (16, 16))
    for (c_x, c_y, c_ax) in zip(t_x, t_y, m_axs.flatten()):
        c_ax.imshow(c_x, cmap = 'bone')
        if c_y == "1": 
            c_ax.set_title(str(c_y) + "-MALIGNANT")
        else:
            c_ax.set_title(str(c_y) + "-BENIGN")
        c_ax.axis('off')


    plt.savefig(base_output_path + "1dataAug.png")

The Image Data Generator function returns these transformed images.

The Keras ImageDataGenerator class works by:
- Accepting a batch of images used for training.
- Taking this batch and applying a series of random transformations to each image in the batch (including random rotation, resizing, shearing, etc.).
- Replacing and returning the original batch with the new, randomly transformed batch.

Source: https://www.pyimagesearch.com/2019/07/08/keras-imagedatagenerator-and-data-augmentation/

## Transfer Learning

Conventional machine learning and deep learning algorithms, so far, have been traditionally designed to work in isolation. These algorithms are trained to solve specific tasks. The models have to be rebuilt from scratch once the feature-space distribution changes. Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing knowledge acquired for one task to solve related ones. 


![Transfer Learning](https://github.com/SaschaMet/melanoma-classification/blob/master/images/transfer-learning.png?raw=1)
 

Traditional learning is isolated and occurs purely based on specific tasks, datasets, and training separate isolated models on them. No knowledge is retained, which can be transferred from one model to another. In transfer learning, you can leverage knowledge (features, weights, etc.) from previously trained models for training newer models and even tackle problems like having less data for the more recent task.

**Fine Tuning Off-the-shelf Pre-trained Models**

This is a more involved technique, where we do not just replace the final layer (for classification/regression), but we also selectively retrain some of the previous layers. 


![Transfer Learning](https://miro.medium.com/max/700/1*BBZGHtI_vhDBeqsIbgMj1w.png)
 



Source: https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a

In [None]:
def load_pretrained_model():

    # freeze the first 15 layers of the base model. All other layers are trainable.
    for layer in base_model.layers[0:15]:
        layer.trainable = BASE_MODEL_TRAINABLE

    for idx, layer in enumerate(base_model.layers):
        print("layer", idx + 1, ":", layer.name, "is trainable:", layer.trainable)

    return base_model

def create_model():
    print("create model")
    model = Sequential()
    model.add(load_pretrained_model())  
    # Add a flatten layer to prepare the ouput of the cnn layer for the next layers
    model.add(layers.Flatten())
    # Add a dense (aka. fully-connected) layer. 
    model.add(layers.Dense(128, activation='relu'))
    # Add a dropout-layer which may prevent overfitting and improve generalization ability to unseen data.
    model.add(layers.Dropout(0.3))

    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dropout(0.3))

    model.add(layers.Dense(32, activation='relu'))

    # Use the Sigmoid activation function for binary predictions, softmax for n-classes
    model.add(layers.Dense(OUTPUT_NEURONS, activation='sigmoid'))
    return model

model = create_model()
model.summary()

In [None]:
callback_list = []

# if the model does not improve for 10 epochs, stop the training
stop_early = EarlyStopping(monitor='val_loss', mode='auto', patience=10)
callback_list.append(stop_early)

# if the output of the model should be saved, create a checkpoint callback function
if SAVE_OUTPUT:
    # set the weight path for saving the model
    weight_path = base_output_path + "3model.hdf5"
    # create the model checkpoint callback to save the model wheights to a file
    checkpoint = ModelCheckpoint(
        weight_path,
        save_weights_only=True,
        verbose=VERBOSE_LEVEL,
        save_best_only=True,
        monitor='val_loss',
        overwrite=True,
        mode='auto',
    )
    # append the checkpoint callback to the callback list
    callback_list.append(checkpoint)

## Model training

In [None]:
# create a training and validation dataset from the train df
train_df, val_df = create_splits(train, 0.2, 'target')

print("rows in train_df", train_df.shape[0])
print("rows in val_df", val_df.shape[0])

print(train_df.dtypes)
print(val_df.dtypes)

# because we do not need the target column anymore we can drop it
# train_df.drop(['target'], axis=1, inplace=True)
# val_df.drop(['target'], axis=1, inplace=True)
# print(train_df.dtypes)
# print(val_df.dtypes)

# call the generator functions
train_gen = get_training_gen(train_df)
val_gen = get_validation_gen(val_df)
valX, valY = val_gen.next()

In [None]:

class_weights = np.array([1.0,1.0])
#Probar undersampling calssweights y tal.

if not DO_UNDERSAMPLING:
    print("Calculating class weights (no undersampling)")
    #quizás auemntar epochs pero complicado con 1000
    testY = np.array(train_df['target'])
    class_weights = class_weight.compute_class_weight('balanced',np.unique(testY), testY)
else:
    print("No class weights (previously undersampled)")

class_weights = {i : class_weights[i] for i in range(2)}

print(class_weights)

In [None]:
#sE CENTRA EN POSIBLES PREPROESADOS Y TAL, NO TANTO EN MODELO IDEAL PERFECTO CON LEARNING RATE Y TAL Y CUAL
print("CLASSMODE:", CLASS_MODE)
print("DO_UNDERSAMPLING", DO_UNDERSAMPLING)
print("BATCH_SIZE: ", BATCH_SIZE)
print("EPOCHS: ", EPOCHS)
print("PREPRO_ROTATION", PREPRO_ROTATION)
print("PREPRO_BLUR", PREPRO_BLUR)
print("LEARNING_RATE", LEARNING_RATE)


OPTIMIZER = Adam(lr=LEARNING_RATE) #Dejar esto que es el que mejor funcionar
LOSS = 'binary_crossentropy'
METRICS = [
    'accuracy', 
    'AUC'
] 

model.compile(
    loss=LOSS,
    metrics=METRICS,
    optimizer=OPTIMIZER,
)
if FAST_RUN:
    EPOCHS = 3
# when on a cpu, do not save the model data
if DEVICE == 'CPU':
    print("fit model on cpu")
    history = model.fit(
        train_gen, 
        epochs=EPOCHS, 
        class_weight = class_weights,
        verbose=VERBOSE_LEVEL,
        validation_data=(valX, valY)
    )
else:
    print("fit model on gpu")
    history = model.fit(
        train_gen, 
        epochs=EPOCHS, 
        class_weight = class_weights,
        verbose=VERBOSE_LEVEL,
        callbacks=callback_list, 
        validation_data=(valX, valY),
    )

## Model Evaluation

In [None]:
# plot model history
save_history(history.history, timestamp)

From the accuracy plot, we can see that the model stops learning after epoch 22. We can also see that the model has not yet over-learned the training dataset. We can see that the model has comparable performance on both train and validation datasets from the loss plot. The model achieved the lowest loss at the 19th epoch.

However, the plots suggest that our model has difficulty generalizing, as the validation curves vary widely in some cases.

In [None]:
# plot the auc
y_t = [] # true labels
y_p = [] # predictions

# iterate over the validation df and make a prediction for each image
# for i in tqdm(range(val_df.shape[0])):
rangeValue = val_df.shape[0] if not FAST_RUN else 50
for i in range(rangeValue):
    y_real = val_df.iloc[i].target
    y_real_int = int(y_real)
    image_path = val_df.iloc[i].image_path

    img = keras.preprocessing.image.load_img(image_path, target_size=IMG_SIZE)
    img = keras.preprocessing.image.img_to_array(img)
    img = img / 255
    img_array = tf.expand_dims(img, 0)
    y_pred = model.predict(img_array)
    y_pred_num = round(y_pred[0][0],2)
    y_pred_class = int(round(y_pred_num, 0))
    #print("Real: ", y_real, "-> pred: ", y_pred_num, "class", y_pred_class)
    y_t.append(y_real_int)
    y_p.append(y_pred_class)

plot_auc(y_t, y_p)

AUC - ROC curve is a performance measurement for the classification problem at various threshold settings. ROC is a probability curve, and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s. By analogy, the Higher the AUC, the better the model is at distinguishing between patients with the disease and no disease. The ROC curve is plotted with TPR against the FPR, where TPR is on the y-axis and FPR is on the x-axis.

Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

### F1 Score Calculation

The F1 score is the harmonic mean of precision and recall. In a statistical analysis of binary classification, the F-score is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of correctly identified positive results divided by the number of all positive outcomes, including those not identified correctly. The recall is the number of correctly identified positive results divided by the number of all samples that should have been identified as positive.

The highest possible value of an F-score is 1, indicating perfect precision and recall, and the lowest potential value is 0 if either the precision or the recall is zero.

Source: https://en.wikipedia.org/wiki/F-score

In [None]:
""" Helper function to calculate the F1 Score

    Parameters:
        prec (int): precision
        recall (int): recall

    Returns:
        f1 score (int)
"""
def calc_f1(prec, recall):
    return 2*(prec*recall)/(prec+recall) if recall and prec else 0

# calculate the precision, recall and the thresholds
precision, recall, thresholds = precision_recall_curve(y_t, y_p)

# calculate the f1 score
f1score = [calc_f1(precision[i],recall[i]) for i in range(len(thresholds))]

# get the index from the highest f1 score
idx = np.argmax(f1score)

# get the precision, recall, threshold and the f1score
precision = round(precision[idx], 4)
recall = round(recall[idx], 4)
# threshold = round(thresholds[idx], 4)
f1score = round(f1score[idx], 4)

print('Precision:', precision)
print('Recall:', recall)
# print('Threshold:', threshold)
print('F1 Score:', f1score)

In [None]:
# create a confusion matrix
cm =  confusion_matrix(y_t, y_p)
cm

In [None]:
""" Helper function to plot a confusion matrix

    Parameters:
        cm (confusion matrix)

    Returns:
        Null
"""
def plot_confusion_matrix(cm, labels):
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title('Confusion Matrix')
    plt.colorbar()
    tick_marks = np.arange(len(labels))
    plt.xticks(tick_marks, labels, rotation=55)
    plt.yticks(tick_marks, labels)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], 'd'), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()
    if SAVE_OUTPUT:
        plt.savefig(base_output_path + "4confMatrix.png")

cm_plot_label =['benign', 'malignant']
plot_confusion_matrix(cm, cm_plot_label)

The model predicted 191 images correclty, but failed on 43.

## Inference

In [None]:
image_path = test.iloc[0].image_path
# Show a prediction for a random image
image_path = test.sample().iloc[0].image_path
img = keras.preprocessing.image.load_img(image_path, target_size=IMG_SIZE)
img = keras.preprocessing.image.img_to_array(img)
img = img / 255
img_array = tf.expand_dims(img, 0)

y_pred = model.predict(img_array)
y_pred_num = round(y_pred[0][0],2)
    
prediction = y_pred_num
print("Chance of being malignant: {:.2f} %".format(prediction))

finding = "Diagnosis: BENIGN"
if not prediction < 0.5:
    finding = "Diagnosis: MALIGNANT"

x = plt.figure(figsize=(5,5))
x = plt.imshow(img)
x = plt.title(finding)
x = plt.axis("off")

## Discussion

We achieved an F1-Score of 0.817. Based on a study from Han SS, Moon IJ, Lim W, et al. "Keratinocytic Skin Cancer Detection on the Face Using Region-Based Convolutional Neural Network" the F1-score of a professional dermatologist is 0.835. Compared to this result, our neural network performed slightly worse. It should be noted, however, that only facial skin cancer was considered in this study. The F1-Score of professional dermatologists on a more realistic dataset like this one could therefore differ.


Source: https://pubmed.ncbi.nlm.nih.gov/31799995/

<img src="https://github.com/SaschaMet/melanoma-classification/blob/master/images/clinical_relevance.png?raw=1" alt="Clinical Relevance" style="width: 600px;"/>

Source: https://www.udacity.com/course/ai-for-healthcare-nanodegree--nd320

Precision and Recall are of particular interest to the clinical applicability of the model. A model with high precision has increased confidence in a positive result. It is, therefore, better suited in confirming a diagnosis. A model with high Recall, on the other hand, is most confident when the test is negative. Such a model is better used for prioritization tasks (e.g., which lesions should be looked at first).

The precision of this model is 0.8136. The recall is 0.8205. This neural network should be better suited for prioritization tasks than for confirming a diagnosis. But, because we achieved an F1-Score comparable to professional dermatologists, plus precision and recall are almost the same, the model could also be useful in, e.g., confirming a dermatologist's diagnosis.

### Cut-Off Thresholds

The output of our CNN's last layer will output a probability that an image belongs to a given class (target_0 or target_1). Changing the threshold for this classification will transform the true positive, false positive, false negative, and true negative rates. This will, in turn, change the precision and recall of our model. We could, for example, change the threshold so that our precision increases. This, however, has the result that our recall changes and probably even decreases. One metric is optimized at the expense of another. Because of this, we used the F1-Score as a final metric because the F1-Score combines both precision and recall. (It is the harmonic mean of precision and recall.)

### Kaggle Leaderboard

When predicting the images from the provided test set, the model achieves a private score of 0.8041 and a public score of 0.8247. This results in a place on the leaderboard at around 2.700.

The best model on the private leaderboard of the Kaggle competition achieved a score of 0.9490.

All submissions are evaluated on the area under the ROC curve between the predicted probability and the observed target.

## How to further improve the model

Based on the placement on the leaderboard, you can already see that there is still a lot of potential for optimization. The following possibilities could be addressed, for example:

- Try different pre-trained models: https://keras.io/api/applications
- Hyperparameter tuning: https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html
- Get more training data: https://www.kaggle.com/wanderdust/skin-lesion-analysis-toward-melanoma-detection
- Experiment with different loss functions: https://www.tensorflow.org/addons/api_docs/python/tfa/losses/SigmoidFocalCrossEntropy
- Improve the data augmentation, e.g. by removing body hair: https://www.kaggle.com/vatsalparsaniya/melanoma-hair-remove

However, even with these options, it will be challenging to get a top ranking on the leaderboard. For example, the winning team consists of three Kaggle Grandmasters, all of whom work at NVIDIA. In an interview, they also mentioned that one of their most significant advantages was the abundant resources (e.g., GPUs) they received from NVIDIA.

Source: https://www.youtube.com/watch?v=L1QKTPb6V_I

In [None]:
if SAVE_OUTPUT:
    # save the model to a json file
    model_json = model.to_json()
    with open(base_output_path + "3model.json", "w") as json_file:
        json_file.write(model_json)

    # create the submission.csv file
    data=[]
    
    rangeValue = test.shape[0] if not FAST_RUN else 50
    for i in tqdm(range(rangeValue)):
        image_path = test.iloc[i].image_path
        image_name = test.iloc[i].image_name
        img = keras.preprocessing.image.load_img(image_path, target_size=IMG_SIZE)
        img = keras.preprocessing.image.img_to_array(img)
        img = img / 255
        img_array = tf.expand_dims(img, 0)
        
        
        y_pred = model.predict(img_array)
        y_pred_class = int(round(y_pred[0][0],0))
        data.append([image_name, y_pred_class])

    sub_df = pd.DataFrame(data, columns = ['image_name', 'target']) 
    sub_df.to_csv(base_output + "submission.csv", index=False)

    sub_df.head()

In [None]:
shutil.make_archive("/kaggle/working/" + experiment_id, 'zip', "/kaggle/working/" + experiment_id)

shutil.rmtree('/kaggle/working/' + experiment_id)