# **Klasifikasi Kanker Kulit Menggunakan CNN (ResNet50)**

* Pada program ini akan menjelaskan klasifikasi kanker kulit secara otomatis menggunakan Convolutional Neural Network (CNN). Program ini akan mencoba untuk mendeteksi 7 kelas kanker kulit yang berbeda menggunakan ResNet50 dan kemudian akan menganalisis untuk melihat bagaimana model dapat berguna dalam skenario praktis.

* Dataset yang digunakan pada program ini adalah HAM10000, yang terdiri dari 10015 citra *dermoscopic* dengan ukuran (600x450) pxl. Dataset dapat diakses melalui web Kaggle pada link berikut : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

* Judul Referensi Jurnal Utama yaitu **Skin Lesion Analyser: An Efficient Seven-Way Multi-Class Skin Cancer Classification Using MobileNet**.
Hasil akurasi yang didapatkan dalam jurnal ini dengan menggunakan model MobileNet yaitu: Cat Acc sebesar 83.1%, Top2 Acc sebesar 91.36%, Top3 Acc sebesar 95.34%. Kemudian untuk hasil Weighted Average dari precision, recall, dan f1-score yaitu sebesar 89%, 83%, dan 83%. 


* Parameter yang didapatkan dari jurnal referensi utama: 
** Spliting data : training set (9077 images) dan Validation set (938 images) 
** Batch size : 10
** Optimizer : Adam 
** loss function : Categorical Crossentropy

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

💡HASIL💡
Hasil akurasi yang didapatkan dari program ini dengan menggunakan model Pre Trained ResNet50 yaitu : Cat Acc sebesar 88.7%, Top2 Acc sebesar 94.24%, Top3 Acc sebesar 98.08%. Kemudian untuk hasil Weighted Average dari precision, recall, dan f1-score yaitu sebesar 86%, 88%, dan 87%. 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

💡 HASIL REVISI💡
Hasil akurasi yang didapatkan dari program ini dengan menggunakan model Pre Trained ResNet50 yaitu : Cat Acc sebesar 90.62%, Top2 Acc sebesar 96.37%, Top3 Acc sebesar 98.93%. Kemudian untuk hasil Weighted Average dari precision, recall, dan f1-score yaitu sebesar 89%, 90%, dan 89%. 💡

## Imports Library

In [None]:
import os
import shutil
import keras 

import cv2
import gc
import keras
import numpy as np
import pandas as pd
from keras.applications.resnet import ResNet50
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from keras.layers import (BatchNormalization, Dense, Dropout, Flatten)
from keras.metrics import categorical_accuracy, top_k_categorical_accuracy
from keras.models import Sequential
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.model_selection import train_test_split

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/MyDrive/ML"

In [None]:
%cd /content/drive/MyDrive/ML

/content/drive/MyDrive/ML


In [None]:
!kaggle datasets download -d kmader/skin-cancer-mnist-ham10000

Downloading skin-cancer-mnist-ham10000.zip to /content/drive/MyDrive/ML
100% 5.19G/5.20G [00:26<00:00, 202MB/s]
100% 5.20G/5.20G [00:26<00:00, 211MB/s]


In [None]:
ls

kaggle.json  skin-cancer-mnist-ham10000.zip


In [None]:
!unzip \*.zip &> /dev/null && rm *.zip

In [None]:
df_data = pd.read_csv("/content/drive/MyDrive/ML/HAM10000_metadata.csv")
df_data.head()

Unnamed: 0,lesion_id,image_id,dx,dx_type,age,sex,localization
0,HAM_0000118,ISIC_0027419,bkl,histo,80.0,male,scalp
1,HAM_0000118,ISIC_0025030,bkl,histo,80.0,male,scalp
2,HAM_0002730,ISIC_0026769,bkl,histo,80.0,male,scalp
3,HAM_0002730,ISIC_0025661,bkl,histo,80.0,male,scalp
4,HAM_0001466,ISIC_0031633,bkl,histo,75.0,male,ear


In [None]:
# Class yang terdapat pada dataset
lesion_type_dict = {
    'nv': 'Melanocytic nevi',
    'mel': 'Melanoma',
    'bkl': 'Benign keratosis ',
    'bcc': 'Basal cell carcinoma',
    'akiec': 'Actinic keratoses',
    'vasc': 'Vascular lesions',
    'df': 'Dermatofibroma'
}
df_data['lesion']= df_data.dx.map(lesion_type_dict)

In [None]:
df_data["dx"].value_counts()

nv       6705
mel      1113
bkl      1099
bcc       514
akiec     327
vasc      142
df        115
Name: dx, dtype: int64

In [None]:
df_data["dx"].value_counts() / df_data.shape[0]

nv       0.669496
mel      0.111133
bkl      0.109735
bcc      0.051323
akiec    0.032651
vasc     0.014179
df       0.011483
Name: dx, dtype: float64

In [None]:
image_sample = cv2.imread("/content/drive/MyDrive/ML/ham10000_images_part_1/ISIC_0024306.jpg")
print(image_sample.shape)

(450, 600, 3)


## Exploratory Data Analysis

In [None]:
df_data.isna().sum()

lesion_id        0
image_id         0
dx               0
dx_type          0
age             57
sex              0
localization     0
lesion           0
dtype: int64

In [None]:
umur = df_data['age'].mean()
df_data['age'] = df_data['age'].fillna(umur)
df_data.isna().sum()

lesion_id       0
image_id        0
dx              0
dx_type         0
age             0
sex             0
localization    0
lesion          0
dtype: int64

# Splitting Data

In [None]:
_, data_test = train_test_split(df_data, test_size=0.0936, random_state=101, stratify=df_data['dx'])

data_test.shape[0]

938

In [None]:
def identify_test_rows(x):
    test_list = list(data_test['image_id'])
    if str(x) in test_list:
        return 'test'
    else:
        return 'train'

df_data['train_or_test'] = df_data['image_id']
df_data['train_or_test'] = df_data['train_or_test'].apply(identify_test_rows)
   
data_train = df_data[df_data['train_or_test'] == 'train']

data_train.shape[0]

9077

In [None]:
print("Data Train : ", len(data_train))
data_train['dx'].value_counts()

Data Train :  9077


nv       6077
mel      1009
bkl       996
bcc       466
akiec     296
vasc      129
df        104
Name: dx, dtype: int64

In [None]:
print("Data Test : ", len(data_test))
data_test['dx'].value_counts()

Data Test :  938


nv       628
mel      104
bkl      103
bcc       48
akiec     31
vasc      13
df        11
Name: dx, dtype: int64

In [None]:
df_data.set_index('image_id', inplace=True)

In [None]:
base_dir = "base_dir"
os.mkdir(base_dir)

train_dir = os.path.join(base_dir, "image_train")
os.mkdir(train_dir)

test_dir = os.path.join(base_dir, "image_test")
os.mkdir(test_dir)

labels = list(df_data["dx"].unique())

for label in labels:
    label_path_train = os.path.join(train_dir, label)
    os.mkdir(label_path_train)
    label_path_test = os.path.join(test_dir, label)
    os.mkdir(label_path_test)

In [None]:
folder_1 = os.listdir('/content/drive/MyDrive/ML/ham10000_images_part_1')
folder_2 = os.listdir('/content/drive/MyDrive/ML/ham10000_images_part_2')

train_images = list(data_train['image_id'])
test_images = list(data_test['image_id'])

for image in train_images:
    
    fname = image + '.jpg'
    label = df_data.loc[image,'dx']
    
    if fname in folder_1:
        source_path = os.path.join('/content/drive/MyDrive/ML/ham10000_images_part_1', fname)
        destination_path = os.path.join(train_dir, label, fname)
        shutil.copyfile(source_path, destination_path)

    if fname in folder_2:
        source_path = os.path.join('/content/drive/MyDrive/ML/ham10000_images_part_2', fname)
        destination_path = os.path.join(train_dir, label, fname)
        shutil.copyfile(source_path, destination_path)


for image in test_images:
    
    fname = image + '.jpg'
    label = df_data.loc[image,'dx']
    
    if fname in folder_1:
        source_path = os.path.join('/content/drive/MyDrive/ML/ham10000_images_part_1', fname)
        destination_path = os.path.join(test_dir, label, fname)
        shutil.copyfile(source_path, destination_path)

    if fname in folder_2:
        source_path = os.path.join('/content/drive/MyDrive/ML/ham10000_images_part_2', fname)
        destination_path = os.path.join(test_dir, label, fname)
        shutil.copyfile(source_path, destination_path)

In [None]:
for label in labels:
    print(label + " train: " + str(len(os.listdir(os.path.join(train_dir, label)))))
print("\n")
for label in labels:
    print(label + " test: " + str(len(os.listdir(os.path.join(test_dir, label)))))

bkl train: 996
nv train: 6077
df train: 104
mel train: 1009
vasc train: 129
bcc train: 466
akiec train: 296


bkl test: 103
nv test: 628
df test: 11
mel test: 104
vasc test: 13
bcc test: 48
akiec test: 31


# Augmentation Data

In [None]:
data_gen_param = {
    "rotation_range": 180,
    "width_shift_range": 0.1,
    "height_shift_range": 0.1,
    "zoom_range": 0.1,
    "horizontal_flip": True,
    "vertical_flip": True
}
data_generator = ImageDataGenerator(**data_gen_param)
num_images_each_label = 6000

aug_dir = os.path.join(base_dir, "aug_dir")
os.mkdir(aug_dir)

for label in labels:
    
    img_dir = os.path.join(aug_dir, "aug_img")
    os.mkdir(img_dir)
    
    src_dir_label = os.path.join(train_dir, label)
    for image_name in os.listdir(src_dir_label):
        shutil.copy(os.path.join(src_dir_label, image_name), os.path.join(img_dir, image_name))
    
    batch_size = 10
    data_flow_param = {
        "directory": aug_dir,
        "color_mode": "rgb",
        "batch_size": batch_size,
        "shuffle": True,
        "save_to_dir": os.path.join(train_dir, label),
        "save_format": "jpg"
    }

    aug_data_gen = data_generator.flow_from_directory(**data_flow_param)
    
    num_img_aug = num_images_each_label - len(os.listdir(os.path.join(train_dir, label)))
    num_batch = int(num_img_aug / batch_size)
    
    for i in range(0, num_batch):
        next(aug_data_gen)
    
    shutil.rmtree(img_dir)

Found 996 images belonging to 1 classes.
Found 6077 images belonging to 1 classes.
Found 104 images belonging to 1 classes.
Found 1009 images belonging to 1 classes.
Found 129 images belonging to 1 classes.
Found 466 images belonging to 1 classes.
Found 296 images belonging to 1 classes.


In [None]:
IMAGE_SHAPE = (224, 224, 3)

train_flow_param = {
    "directory": train_dir,
    "batch_size": batch_size,
    "target_size": IMAGE_SHAPE[:2],
    "shuffle": True
}
print('Data Train : ')
train_flow = data_generator.flow_from_directory(**train_flow_param)

test_flow_param = {
    "directory": test_dir,
    "batch_size": batch_size,
    "target_size": IMAGE_SHAPE[:2],
    "shuffle": False
}
print('Data Test :')
test_flow = data_generator.flow_from_directory(**test_flow_param)

Data Train : 
Found 41550 images belonging to 7 classes.
Data Test :
Found 938 images belonging to 7 classes.


In [None]:
#Menampilkan jumlah data train dan test perkelas
for label in labels:
    print(label + " train: " + str(len(os.listdir(os.path.join(train_dir, label)))))
print("\n")
for label in labels:
    print(label + " test: " + str(len(os.listdir(os.path.join(test_dir, label)))))

bkl train: 5976
nv train: 6077
df train: 5676
mel train: 5995
vasc train: 5954
bcc train: 5952
akiec train: 5920


bkl test: 103
nv test: 628
df test: 11
mel test: 104
vasc test: 13
bcc test: 48
akiec test: 31


## Train the Model ResNet50

In [None]:
dropout_dense = 0.1

resnet_model = ResNet50(input_shape=IMAGE_SHAPE, include_top=False, pooling="max")

model = Sequential()
model.add(resnet_model)
model.add(Dropout(dropout_dense))
model.add(BatchNormalization())
model.add(Dense(256, activation="relu"))
model.add(Dropout(dropout_dense))
model.add(BatchNormalization())
model.add(Dense(7, activation="softmax"))

def top_2_acc(y_true, y_pred):
    return top_k_categorical_accuracy(y_true, y_pred, k=2)

def top_3_acc(y_true, y_pred):
    return top_k_categorical_accuracy(y_true, y_pred, k=3)

model.compile(Adam(0.0001), loss="categorical_crossentropy", metrics=[categorical_accuracy, top_2_acc, top_3_acc])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5


In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 resnet50 (Functional)       (None, 2048)              23587712  
                                                                 
 dropout (Dropout)           (None, 2048)              0         
                                                                 
 batch_normalization (BatchN  (None, 2048)             8192      
 ormalization)                                                   
                                                                 
 dense (Dense)               (None, 256)               524544    
                                                                 
 dropout_1 (Dropout)         (None, 256)               0         
                                                                 
 batch_normalization_1 (Batc  (None, 256)              1024      
 hNormalization)                                        

In [None]:
len(model.layers)

7

In [None]:
filepath = "model.h5"

checkpoint_param = {
    "filepath": filepath,
    "monitor": "val_categorical_accuracy",
    "verbose": 1,
    "save_best_only": True,
    "mode": "max"
}
checkpoint = ModelCheckpoint(**checkpoint_param)

lr_decay_params = {
    "monitor": "val_loss",
    "factor": 0.5,
    "patience": 2,
    "min_lr": 1e-5
}
lr_decay = ReduceLROnPlateau(**lr_decay_params)


In [None]:
fit_params = {
    "generator": train_flow,
    "steps_per_epoch": data_train.shape[0] // batch_size,
    "epochs": 100,
    "verbose": 1,
    "validation_data": test_flow,
    "validation_steps": data_test.shape[0] // batch_size,
    "callbacks": [checkpoint, lr_decay]
}
print("Training the model...")

history = model.fit_generator(**fit_params)
print("Done!")

Training the model...


  if sys.path[0] == '':


Epoch 1/100
Epoch 1: val_categorical_accuracy improved from -inf to 0.64624, saving model to model.h5
Epoch 2/100
Epoch 2: val_categorical_accuracy improved from 0.64624 to 0.67312, saving model to model.h5
Epoch 3/100
Epoch 3: val_categorical_accuracy improved from 0.67312 to 0.75161, saving model to model.h5
Epoch 4/100
Epoch 4: val_categorical_accuracy did not improve from 0.75161
Epoch 5/100
Epoch 5: val_categorical_accuracy improved from 0.75161 to 0.77849, saving model to model.h5
Epoch 6/100
Epoch 6: val_categorical_accuracy did not improve from 0.77849
Epoch 7/100
Epoch 7: val_categorical_accuracy did not improve from 0.77849
Epoch 8/100
Epoch 8: val_categorical_accuracy improved from 0.77849 to 0.80645, saving model to model.h5
Epoch 9/100
Epoch 9: val_categorical_accuracy improved from 0.80645 to 0.82366, saving model to model.h5
Epoch 10/100
Epoch 10: val_categorical_accuracy did not improve from 0.82366
Epoch 11/100
Epoch 11: val_categorical_accuracy did not improve from 0.

In [None]:
val_loss, val_cat_acc, val_top_2_acc, val_top_3_acc = \
model.evaluate_generator(test_flow, 
                        steps=len(test_flow))

print('val_loss:', val_loss)
print('val_cat_acc:', val_cat_acc)
print('val_top_2_acc:', val_top_2_acc)
print('val_top_3_acc:', val_top_3_acc)

## Evaluate the Model

In [None]:
y_test_true = test_flow.classes
y_test_pred = np.argmax(model.predict_generator(test_flow, steps=len(test_flow)), axis=1)

In [None]:
loss_train = history.history["loss"]
acc_train = history.history["categorical_accuracy"]
loss_val = history.history["val_loss"]
acc_val = history.history["val_categorical_accuracy"]
epochs = np.arange(1, len(loss_train) + 1)

In [None]:
classes = {4: ('nv', ' melanocytic nevi'),
           6: ('mel', 'melanoma'),
           2 :('bkl', 'benign keratosis-like lesions'), 
           1:('bcc' , ' basal cell carcinoma'),
           5: ('vasc', ' pyogenic granulomas and hemorrhage'),
           0: ('akiec', 'Actinic keratoses and intraepithelial carcinomae'),
           3: ('df', 'dermatofibroma')}

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Confusion Matrix

In [None]:
from sklearn.metrics import classification_report

target_names = [f"{classes[i]}" for i in range(7)]
print(classification_report(y_test_true , y_test_pred , target_names =target_names ))

In [None]:
import seaborn as sns

cm = confusion_matrix(y_test_true, y_test_pred)
cm = pd.DataFrame(cm , index = [i for i in range(7)] , columns = [i for i in range(7)])
plt.figure(figsize = (10,10))
sns.heatmap(cm,cmap= "Blues", linecolor = 'black' , linewidth = 1 , annot = True, fmt='')
plt.savefig("confusion_matrix.svg")

In [None]:
conf_mat = confusion_matrix(y_test_true, y_test_pred)
plt.imshow(conf_mat, cmap=plt.cm.Blues)
plt.title("Confusion matrix")
plt.colorbar()
tick_marks = np.arange(len(labels))
plt.xticks(tick_marks, labels, rotation=45)
plt.yticks(tick_marks, labels)
plt.ylabel("y_true")
plt.xlabel("y_pred")
plt.tight_layout()

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Plot the Trainning Curves

In [None]:
import matplotlib.pyplot as plt

acc = history.history['categorical_accuracy']
val_acc = history.history['val_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
train_top2_acc = history.history['top_2_acc']
val_top2_acc = history.history['val_top_2_acc']
train_top3_acc = history.history['top_3_acc']
val_top3_acc = history.history['val_top_3_acc']
epochs = range(1, len(acc) + 1)

In [None]:
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.savefig("loss.svg")
plt.figure()
plt.show()

In [None]:
plt.plot(epochs, acc, 'bo', label='Training cat acc')
plt.plot(epochs, val_acc, 'b', label='Validation cat acc')
plt.title('Training and validation cat accuracy')
plt.legend()
plt.savefig("cat_acc.svg")
plt.figure()
plt.show()

In [None]:
plt.plot(epochs, train_top2_acc, 'bo', label='Training top2 acc')
plt.plot(epochs, val_top2_acc, 'b', label='Validation top2 acc')
plt.title('Training and validation top2 accuracy')
plt.legend()
plt.savefig("top2.svg")
plt.figure()
plt.show()

In [None]:
plt.plot(epochs, train_top3_acc, 'bo', label='Training top3 acc')
plt.plot(epochs, val_top3_acc, 'b', label='Validation top3 acc')
plt.title('Training and validation top3 accuracy')
plt.legend()
plt.savefig("top3.svg")
plt.figure()
plt.show()

In [None]:
model.save('model-ResNet50.h5')

In [None]:
model_json = model.to_json()
with open("model-ResNet50.json", "w") as json_file:
    json_file.write(model_json)