# SIIM-ISIC Melanoma Classification
Skin cancer is the most prevalent type of cancer. Melanoma, specifically, is responsible for 75% of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It's also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection—potentially aided by data science—can make treatment more effective.

Currently, dermatologists evaluate every one of a patient's moles to identify outlier lesions or “ugly ducklings” that are most likely to be melanoma. Existing AI approaches have not adequately considered this clinical frame of reference. Dermatologists could enhance their diagnostic accuracy if detection algorithms take into account “contextual” images within the same patient to determine which images represent a melanoma. If successful, classifiers would be more accurate and could better support dermatological clinic work.

As the leading healthcare organization for informatics in medical imaging, the Society for Imaging Informatics in Medicine (SIIM)'s mission is to advance medical imaging informatics through education, research, and innovation in a multi-disciplinary community. SIIM is joined by the International Skin Imaging Collaboration (ISIC), an international effort to improve melanoma diagnosis. The ISIC Archive contains the largest publicly available collection of quality-controlled dermoscopic images of skin lesions.

In this competition, you’ll identify melanoma in images of skin lesions. In particular, you’ll use images within the same patient and determine which are likely to represent a melanoma. Using patient-level contextual information may help the development of image analysis tools, which could better support clinical dermatologists.

Melanoma is a deadly disease, but if caught early, most melanomas can be cured with minor surgery. Image analysis tools that automate the diagnosis of melanoma will improve dermatologists' diagnostic accuracy. Better detection of melanoma has the opportunity to positively impact millions of people.

# Import the necessary libraries

In [None]:
# General libraries

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split 
 
# Modelling libraries 
import tensorflow as tf 
from tensorflow.keras.preprocessing.image import ImageDataGenerator 
from tensorflow.keras import Model, Sequential 
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, GlobalAveragePooling2D 
from tensorflow.keras.applications import InceptionV3, VGG16, MobileNet

In [None]:
# Variables 
 
TARGET_SIZE = 256 
TEST_SIZE = 0.5 
BATCH_SIZE = 64 
RANDOM_STATE = 42 
EPOCHS = 5 
LR = 0.0001 


# Data Obtainance

In [None]:
train = pd.read_csv('../input/siim-isic-melanoma-classification/train.csv') 
test = pd.read_csv('../input/siim-isic-melanoma-classification/test.csv') 
submission = pd.read_csv('../input/siim-isic-melanoma-classification/sample_submission.csv')

# Data Preprocessing

**General Exploration**

In [None]:
# Train csv 
 
train.head()

In [None]:
# See distribution of target 
plt.figure(figsize=(12, 8)) 
sns.countplot(x='target', data=train, palette='husl');

In [None]:
# Confirm the number 
print(train.target.value_counts()) 
print('-'*20) 
train.target.value_counts(normalize=True)

In [None]:
# Add jpg extension to images 
train['images'] = train['image_name'] + '.jpg' 
 
train.head()

There is great imbalance. We need to take care of this.

**Dealing with imbalance**

In [None]:
# Initialize weights 
WEIGHTS = { 
    0:0.51, 
    1:28.36 
} 
 
# Initialize bias 
bias = tf.keras.initializers.Constant(np.log([584/32542]))

**View the data**

In [None]:
train_path = '../input/siim-isic-melanoma-classification/jpeg/train'
test_path = '../input/siim-isic-melanoma-classification/jpeg/test'

In [None]:
def show_images(label, data, path): 
    # Get images  
    df = data.loc[data['target'] == label] 
    images = df['images'].values 
 
    # Extract 16 random images from it 
    random_images = [np.random.choice(images) for i in range(9)] 
 
    # Adjust the size of your images 
    plt.figure(figsize=(16,12)) 
 
    # Iterate and plot random images 
    for i in range(9): 
        plt.subplot(3,3, i + 1) 
        img = plt.imread(os.path.join(path, random_images[i])) 
         
        try: 
          plt.imshow(img, cmap='gray') 
          plt.axis('off') 
        except FileNotFoundError: 
          pass 
 
    # Adjust subplot parameters to give specified padding 
    plt.tight_layout() 

In [None]:
# Melanoma positive 
show_images(1, train, train_path)

In [None]:
# Free from cancer 
show_images(0, train, train_path)

**Create Generators**

In [None]:
# Convert target to string 
train['target'] = train['target'].astype(str)

In [None]:
# Split data 
 
train_set, val_set = train_test_split(train, 
                                      test_size=0.1, 
                                      random_state=RANDOM_STATE, 
                                      stratify=train['target']) 
 
train_set = train_set.reset_index(drop=True) 
val_set = val_set.reset_index(drop=True)

In [None]:
train_datagen = ImageDataGenerator( 
    brightness_range = [0.8, 1.5],  
    horizontal_flip = True, 
    vertical_flip = True, 
    preprocessing_function=tf.keras.applications.mobilenet.preprocess_input 
    ) 
 
val_datagen = ImageDataGenerator( 
    preprocessing_function=tf.keras.applications.mobilenet.preprocess_input 
    )

In [None]:
# Instantiate flows
train_flow = train_datagen.flow_from_dataframe( 
    train_set, 
    train_path, 
    x_col = 'images', 
    y_col = 'target', 
    target_size = (TARGET_SIZE, TARGET_SIZE), 
    class_mode = 'binary', 
    batch_size = BATCH_SIZE 
 
) 
 
 
val_flow = val_datagen.flow_from_dataframe( 
    val_set, 
    train_path, 
    x_col = 'images', 
    y_col = 'target', 
    target_size = (TARGET_SIZE, TARGET_SIZE), 
    class_mode = 'binary', 
    batch_size = BATCH_SIZE 
 
)

In [None]:
# Visualize an image 
x_batch, y_batch = next(train_flow) 
for i in range (0,6): 
    image = x_batch[i] 
    plt.imshow(image) 
    plt.show()

# Model Development

In [None]:
def create_model(): 
 
    global bias  
  # Build model 
    mobilenet = MobileNet(include_top=False,  
                          input_shape=(TARGET_SIZE, TARGET_SIZE, 3), 
                          weights='imagenet') 
   
    for layer in mobilenet.layers: 
      layer.trainable = True 
 
    model = Sequential([ 
                      mobilenet, 
                      GlobalAveragePooling2D(), 
                      #Flatten(), 
                      #Dense(256, activation = 'relu',  
                           #bias_regularizer=tf.keras.regularizers.L1L2(l1=0.01,  
                           #                                            l2=0.001)), 
                      #Dropout(0.5), 
                      Dense(32, activation = 'relu'),
                          #bias_regularizer=tf.keras.regularizers.L1L2(l1=0.01, 
                           #                                            l2=0.001)), 
                      Dropout(0.5), 
                      Dense(1, activation = 'sigmoid', bias_initializer = bias) 
    ]) 
 
  # Instantiate learning rate and optimizer 
 
    adam = tf.keras.optimizers.Adam(LR) 
 
    auc = tf.keras.metrics.AUC( num_thresholds=200, curve='ROC', 
                               summation_method='interpolation', name='auc') 
 
    precision = tf.keras.metrics.Precision(name='precision'), 
    recall = tf.keras.metrics.Recall(name='recall') 
    metrics = [auc, precision, recall] 
 
  # Compile model 
    model.compile(loss = 'binary_crossentropy', 
                optimizer = adam, 
                metrics = metrics) 
   
    return model

In [None]:
model = create_model()

In [None]:
model.summary()

In [None]:
def model_fitter(model): 
 
  # instantiate callbacks 
   
  early_stopper = tf.keras.callbacks.EarlyStopping(monitor='val_auc', 
                                                 patience=5) 
 
  # reduce learning rate 
  reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor = 'val_auc', 
                                  factor = 0.1, 
                                  patience = 2, 
                                  min_lr = 1e-6, 
                                  mode = 'min', 
                                  verbose = 1) 
 
  callbacks = [early_stopper, reduce_lr] 
 
  # Train model 
  history = model.fit(train_flow, 
                    epochs=EPOCHS, 
                    steps_per_epoch=int(np.ceil(len(train_set)/BATCH_SIZE)), 
                    callbacks=callbacks, 
                    validation_data=val_flow, 
                    validation_steps=int(np.ceil(len(val_set)/BATCH_SIZE)), 
                    class_weight = WEIGHTS 
                    ) 
   
  return history, model

In [None]:
history, model = model_fitter(model)