<a href="https://colab.research.google.com/github/Triginarsa/skin-cancer/blob/fandi-branch/Skin_Cancer_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IMPORT DATASET SKIN CANCER MNIST HAM 10K


Input Data 

The input data are dermoscopic lesion images in JPEG format.

All lesion images are named using the scheme ISIC_<image_id>.jpg, where <image_id> is a 7-digit unique identifier. EXIF tags in the images have been removed; any remaining EXIF tags should not be relied upon to provide accurate metadata.

The lesion images were acquired with a variety of dermatoscope types, from all anatomic sites (excluding mucosa and nails), from a historical sample of patients presented for skin cancer screening, from several different institutions. Every lesion image contains exactly one primary lesion; other fiducial markers, smaller secondary lesions, or other pigmented regions may be neglected.

The distribution of disease states represent a modified “real world” setting whereby there are more benign lesions than malignant lesions, but an over-representation of malignancies.

The images in the data-set are separated into the following seven types of skin cancer:

**Actinic keratosis** **(akiec) **is considered to be a noncancerous (benign) type of skin cancer. However, if left untreated, it usually develops into squamous cell carcinoma (which is cancerous).

**Unlike actinic keratosis** **(bcc)**, basal cell carcinoma is a cancerous type of skin lesion that develops in the basal cell layer located in the lower part of the epidermis. It is the most common type of skin cancer accounting for 80% of all cases.

**Benign keratosis** **(bkl)** is a noncancerous and slow-growing type of skin cancer. They can be left untreated as they are typically harmless.

**Dermatofibromas** **(df)** are also noncancerous and usually harmless, thus no treatment is required. It is commonly found pinkish in color and appears like a round bump.

**Melanoma** **(mel)** is a type of malignant skin cancer that originated from melanocytes, cells that are responsible for the pigment of your skin.

**Melanocytic** **(nv)** nevi are a benign type of melanocytic tumor. Patients with melanocytic nevi are considered to be at a higher risk of melanoma.

**Vascular** **(vasc)** lesions are composed of a wide range of skin lesion including cherry angiomas, angiokeratomas, and pyogenic granulomas. They are similarly characterized as being red or purple in color and often appear as a raised bump.

In [None]:
! pip install -q kaggle

In [None]:
# upload your token from kaggle
from google.colab import files
files.upload()

In [None]:
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle datasets download -d kmader/skin-cancer-mnist-ham10000
!unzip skin-cancer-mnist-ham10000

# PREPROCESSING DATASET

In [None]:
import os
# Checking content folder
os.listdir('../content')

In [None]:
# create base dir
base_dir = 'base_dir'
os.mkdir(base_dir)

In [None]:
# create train_dir
train_dir = os.path.join(base_dir, 'train_dir')
os.mkdir(train_dir)

# create val_dir
val_dir = os.path.join(base_dir, 'val_dir')
os.mkdir(val_dir)

In [None]:
# create train subdir 7 categories
nv = os.path.join(train_dir, 'nv')
os.mkdir(nv)
mel = os.path.join(train_dir, 'mel')
os.mkdir(mel)
bkl = os.path.join(train_dir, 'bkl')
os.mkdir(bkl)
bcc = os.path.join(train_dir, 'bcc')
os.mkdir(bcc)
akiec = os.path.join(train_dir, 'akiec')
os.mkdir(akiec)
vasc = os.path.join(train_dir, 'vasc')
os.mkdir(vasc)
df = os.path.join(train_dir, 'df')
os.mkdir(df)

In [None]:
# create val subdir 7 categories 
nv = os.path.join(val_dir, 'nv')
os.mkdir(nv)
mel = os.path.join(val_dir, 'mel')
os.mkdir(mel)
bkl = os.path.join(val_dir, 'bkl')
os.mkdir(bkl)
bcc = os.path.join(val_dir, 'bcc')
os.mkdir(bcc)
akiec = os.path.join(val_dir, 'akiec')
os.mkdir(akiec)
vasc = os.path.join(val_dir, 'vasc')
os.mkdir(vasc)
df = os.path.join(val_dir, 'df')
os.mkdir(df)

In [None]:
import pandas as pd
# read csv file
df_data = pd.read_csv('../content/HAM10000_metadata.csv')
df_data.head()

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
# chect distibution data
fig, ax1 = plt.subplots(1, 1, figsize= (10, 5))
df_data['dx'].value_counts().plot(kind='bar', ax=ax1)

In [None]:
# how many images are associated with each lesion_id
df = df_data.groupby('lesion_id').count()
# have only one image associated with it
df = df[df['image_id'] == 1]
df.reset_index(inplace=True)
df.head()

In [None]:
def identify_duplicates(x):
    unique_list = list(df['lesion_id'])
    if x in unique_list:
        return 'no_duplicates'
    else:
        return 'has_duplicates'
    
# create a new colum that is a copy of the lesion_id column
df_data['duplicates'] = df_data['lesion_id']
# apply the function to this new column
df_data['duplicates'] = df_data['duplicates'].apply(identify_duplicates)

df_data.head()

In [None]:
# count has duplicates and no duplicates
df_data['duplicates'].value_counts()

In [None]:
# filter out images that don't have duplicates
df = df_data[df_data['duplicates'] == 'no_duplicates']
df.shape

In [None]:
from sklearn.model_selection import train_test_split
# create a val set using df because we are sure that none of these images
# have augmented duplicates in the train set
y = df['dx']
_, df_val = train_test_split(df, test_size=0.17, random_state=101, stratify=y)
df_val.shape

In [None]:
# check df_val
df_val['dx'].value_counts()

In [None]:
# identify val or train
def identify_val_rows(x):
    # create a list of all the lesion_id's in the val set
    val_list = list(df_val['image_id']) 
    if str(x) in val_list:
        return 'val'
    else:
        return 'train'

In [None]:
# create a new colum that is a copy of the image_id column
df_data['train_or_val'] = df_data['image_id']
# apply the function to this new column
df_data['train_or_val'] = df_data['train_or_val'].apply(identify_val_rows)   
# filter out train rows
df_train = df_data[df_data['train_or_val'] == 'train']

In [None]:
df_data.head()

In [None]:
len(df_train)

In [None]:
len(df_val)

In [None]:
# check df_train
df_train['dx'].value_counts() 

In [None]:
# Set the image_id as the index in df_data
df_data.set_index('image_id', inplace=True)

In [None]:
# folder datasets
part_1 = os.listdir('../content/ham10000_images_part_1')
part_2 = os.listdir('../content/ham10000_images_part_2')

# list of train and val dataset
train_list = list(df_train['image_id'])
val_list = list(df_val['image_id'])

In [None]:
import shutil
# copy image train
for image in train_list:
    # jpg file
    fname = image + '.jpg'
    # label
    label = df_data.loc[image,'dx']
    if fname in part_1:
        # dir jpg file 1
        src = os.path.join('../content/ham10000_images_part_1', fname)
        # destination train jpg file with 7 categories
        dst = os.path.join(train_dir, label, fname)
        # copy src to dst
        shutil.copyfile(src, dst)

    if fname in part_2:
        # dir jpg file 2
        src = os.path.join('../content/ham10000_images_part_2', fname)
        # destination train jpg file with 7 categories
        dst = os.path.join(train_dir, label, fname)
        # copy src to dst
        shutil.copyfile(src, dst)

In [None]:
# copy image val
for image in val_list:
    fname = image + '.jpg'
    label = df_data.loc[image,'dx']
    
    if fname in part_1:
        src = os.path.join('../content/ham10000_images_part_1', fname)
        dst = os.path.join(val_dir, label, fname)
        shutil.copyfile(src, dst)

    if fname in part_2:
        src = os.path.join('../content/ham10000_images_part_2', fname)
        dst = os.path.join(val_dir, label, fname)
        shutil.copyfile(src, dst)

In [None]:
# check image train_dir
print(len(os.listdir('base_dir/train_dir/nv')))
print(len(os.listdir('base_dir/train_dir/mel')))
print(len(os.listdir('base_dir/train_dir/bkl')))
print(len(os.listdir('base_dir/train_dir/bcc')))
print(len(os.listdir('base_dir/train_dir/akiec')))
print(len(os.listdir('base_dir/train_dir/vasc')))
print(len(os.listdir('base_dir/train_dir/df')))

In [None]:
# check image val_dir
print(len(os.listdir('base_dir/val_dir/nv')))
print(len(os.listdir('base_dir/val_dir/mel')))
print(len(os.listdir('base_dir/val_dir/bkl')))
print(len(os.listdir('base_dir/val_dir/bcc')))
print(len(os.listdir('base_dir/val_dir/akiec')))
print(len(os.listdir('base_dir/val_dir/vasc')))
print(len(os.listdir('base_dir/val_dir/df')))

In [None]:
# Make Augmented image then saved it locally
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
# not augmented "nv" label because the data is already a lot
label_list = ['mel',
              'bkl',
              'bcc',
              'akiec',
              'vasc',
              'df']

In [None]:
for item in label_list:
    # We are creating temporary directories here because we delete these directories later
    # create a base dir
    aug_dir = 'aug_dir'
    os.mkdir(aug_dir)
    # create a dir within the base dir to store images of the same class
    img_dir = os.path.join(aug_dir, 'img_dir')
    os.mkdir(img_dir)

    # Choose a class
    img_class = item

    # list all images in that directory
    img_list = os.listdir('base_dir/train_dir/' + img_class)

    # Copy images from the class train dir to the img_dir e.g. class 'mel'
    for fname in img_list:
            # source path to image
            src = os.path.join('base_dir/train_dir/' + img_class, fname)
            # destination path to image
            dst = os.path.join(img_dir, fname)
            # copy the image from the source to the destination
            shutil.copyfile(src, dst)

    # point to a dir containing the images and not to the images themselves
    path = aug_dir
    save_path = 'base_dir/train_dir/' + img_class

    # Create a data generator
    datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        zoom_range=0.2,
        shear_range = 0.2,
        horizontal_flip=True,
        vertical_flip=True,
        fill_mode='nearest')

    batch_size = 50

    aug_datagen = datagen.flow_from_directory(path,
                                           save_to_dir=save_path,
                                           save_format='jpg',
                                           target_size=(224,224),
                                           batch_size=batch_size)
    
    # Generate the augmented images and add them to the training folders
    num_aug_images_wanted = 6000 # total number of images we want to have in each class
    
    num_files = len(os.listdir(img_dir))
    num_batches = int(np.ceil((num_aug_images_wanted-num_files)/batch_size))

    # run the generator and create about 6000 augmented images
    for i in range(0,num_batches):
        imgs, labels = next(aug_datagen)
        
    # delete temporary directory with the raw image files
    shutil.rmtree('aug_dir')

In [None]:
# This is the original images plus the augmented images.
# The data distribution of train_dir is evenly distributed
# The distribution of val_dir hasn't changed
print(len(os.listdir('base_dir/train_dir/nv')))
print(len(os.listdir('base_dir/train_dir/mel')))
print(len(os.listdir('base_dir/train_dir/bkl')))
print(len(os.listdir('base_dir/train_dir/bcc')))
print(len(os.listdir('base_dir/train_dir/akiec')))
print(len(os.listdir('base_dir/train_dir/vasc')))
print(len(os.listdir('base_dir/train_dir/df')))

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# Visualizaation image
def plots(ims, figsize=(15,15), rows=7, interp=False, titles=None): 
    if type(ims[0]) is np.ndarray:
        ims = np.array(ims).astype(np.uint8)
        if (ims.shape[-1] != 3):
            ims = ims.transpose((0,2,3,1))
    f = plt.figure(figsize=figsize)
    cols = len(ims)//rows if len(ims) % 2 == 0 else len(ims)//rows + 1
    for i in range(len(ims)):
        sp = f.add_subplot(rows, cols, i+1)
        sp.axis('Off')
        if titles is not None:
            sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i], interpolation=None if interp else 'none')
        
plots(imgs, titles=None)

# BUILD MODEL

In [None]:
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.applications import mobilenet_v2
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Reshape, GlobalAveragePooling2D
from tensorflow.keras.layers import Input, Flatten, Activation
from tensorflow.keras.optimizers import Adam, SGD, RMSprop
from tensorflow.keras.callbacks import ModelCheckpoint, Callback, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.metrics import top_k_categorical_accuracy, categorical_accuracy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import backend as K
import tensorflow as tf
from tensorflow.keras import layers
import itertools
print(tf.__version__)

In [None]:
# some constants
IMG_HEIGHT = 224
IMG_WIDTH = 224
IMG_CHANNEL = 3
TRAINING_IMAGE_LEN = 38569
TESTING_IMAGE_LEN = 938
BATCH_SIZE = 64

In [None]:
# Set Variabel Here
train_dir = 'base_dir/train_dir'
validation_dir = 'base_dir/val_dir'

print(len(os.listdir(train_dir)))
print(len(os.listdir(validation_dir)))

In [None]:
# Get a generator 
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    preprocessing_function=mobilenet_v2.preprocess_input,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    shear_range = 0.2,
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest'
    )

valid_datagen = ImageDataGenerator(
    rescale=1. / 255,
    preprocessing_function=mobilenet_v2.preprocess_input
    )

In [None]:
# Make data generator for training and validation data
train_datagenerator = train_datagen.flow_from_directory(
    train_dir,                           
    target_size=(IMG_HEIGHT, IMG_WIDTH), 
    class_mode="categorical",
    batch_size=BATCH_SIZE
    )

valid_datagenerator = valid_datagen.flow_from_directory(
    validation_dir, 
    target_size=(IMG_HEIGHT, IMG_WIDTH), 
    class_mode="categorical",   
    batch_size=BATCH_SIZE
    )

test_datagenerator = valid_datagen.flow_from_directory(
    validation_dir, 
    target_size=(IMG_HEIGHT, IMG_WIDTH), 
    class_mode="categorical",   
    batch_size=1,
    shuffle=False
    )

In [None]:
# imports the MobileNetV2 model and discards the last 1000 neuron layer.
base_model=mobilenet_v2.MobileNetV2(
    input_shape=(
    IMG_HEIGHT,
    IMG_WIDTH,
    IMG_CHANNEL),
    include_top=False,
    weights='imagenet',
    pooling="avg"
    ) 

In [None]:
base_model.summary()

In [None]:
len(base_model.layers)

In [None]:
# Define Top2 and Top3 Accuracy
def top_3_accuracy(y_true, y_pred):
    return top_k_categorical_accuracy(y_true, y_pred, k=3)

def top_2_accuracy(y_true, y_pred):
    return top_k_categorical_accuracy(y_true, y_pred, k=2)

In [None]:
# ADD Layer
x = base_model.layers[-2].output
x = Dropout(0.25)(x) 
preds= Dense(7,activation='softmax')(x) 
model=Model(inputs=base_model.input,outputs=preds) #specify the inputs and outputs
model.summary() 

In [None]:
model_viz = tf.keras.utils.plot_model(model,
                          to_file='model_mobilenetv2.png',
                          show_shapes=True,
                          show_layer_names=True,
                          rankdir='TB',
                          expand_nested=True,
                          dpi=55)
model_viz

In [None]:
""" freeze the convolutional base created from the previous step and use 
that as a feature extractor, add a classifier on top of it 
and train the top-level classifier"""
model.trainable = False

# TRAIN MODEL

In [None]:
model.compile(Adam(lr=0.01), loss='categorical_crossentropy', 
              metrics=[categorical_accuracy, top_2_accuracy, top_3_accuracy]) 

In [None]:
# Add weights to try to make the model more sensitive to melanoma
class_weights={
    0: 1.0, # akiec
    1: 1.0, # bcc
    2: 1.0, # bkl
    3: 1.0, # df
    4: 3.0, # mel # Try to make the model more sensitive to Melanoma.
    5: 1.0, # nv
    6: 1.0, # vasc
}

In [None]:
filepath = "model_mobilenetv2.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_top_3_accuracy', verbose=1, 
                             save_best_only=True, mode='max')

reduce_lr = ReduceLROnPlateau(monitor='val_top_3_accuracy', factor=0.5, patience=2, 
                                   verbose=1, mode='max', min_lr=0.00001)
                                                           
callbacks_list = [checkpoint, reduce_lr]

In [None]:
history = model.fit_generator(train_datagenerator,
                              steps_per_epoch = TRAINING_IMAGE_LEN/128,
                              epochs = 30 ,
                              validation_data = valid_datagenerator,
                              validation_steps = TESTING_IMAGE_LEN/128,
                              class_weight=class_weights,
                              callbacks=callbacks_list)

In [None]:
# display the curves

acc = history.history['categorical_accuracy']
val_acc = history.history['val_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
train_top2_acc = history.history['top_2_accuracy']
val_top2_acc = history.history['val_top_2_accuracy']
train_top3_acc = history.history['top_3_accuracy']
val_top3_acc = history.history['val_top_3_accuracy']
epochs = range(1, len(acc) + 1)

# curve loss
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.figure()

# curve cat accuracy
plt.plot(epochs, acc, 'r', label='Training cat acc')
plt.plot(epochs, val_acc, 'b', label='Validation cat acc')
plt.title('Training and validation cat accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.figure()

# curve top2 accuracy
plt.plot(epochs, train_top2_acc, 'r', label='Training top2 acc')
plt.plot(epochs, val_top2_acc, 'b', label='Validation top2 acc')
plt.title('Training and validation top2 accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.figure()

# curve top3 acc
plt.plot(epochs, train_top3_acc, 'r', label='Training top3 acc')
plt.plot(epochs, val_top3_acc, 'b', label='Validation top3 acc')
plt.title('Training and validation top3 accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

**FINE TUNING MODEL**

In our feature extraction experiment, you were only training a few layers on top of an MobileNet V2 base model. The weights of the pre-trained network were not updated during training.

One way to increase performance even further is to train (or "fine-tune") the weights of the top layers of the pre-trained model alongside the training of the classifier you added. The training process will force the weights to be tuned from generic features maps to features associated specifically to our dataset.

In [None]:
model.trainable = True

In [None]:
# freeze  some layers for fine tuning
# Fine tune from this layer onwards
fine_tune_at = 133

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]: # same with [:-23]
  layer.trainable =  False

In [None]:
model.compile(Adam(lr=0.01), loss='categorical_crossentropy', 
              metrics=[categorical_accuracy, top_2_accuracy, top_3_accuracy]) 

In [None]:
history_fine = model.fit_generator(train_datagenerator,
                              steps_per_epoch = TRAINING_IMAGE_LEN/BATCH_SIZE,
                              epochs = 30 ,
                              validation_data = valid_datagenerator,
                              validation_steps = TESTING_IMAGE_LEN/BATCH_SIZE,
                              class_weight=class_weights,
                              callbacks=callbacks_list)

# EVALUATE

In [None]:
from skimage.io import imread, imsave
from skimage.transform import resize
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.utils.class_weight import compute_class_weight
import json
import h5py
import seaborn as sns

In [None]:
test_datagen = data_gen.flow_from_directory(validation_dir,  
                                             target_size=(IMG_HEIGHT, IMG_WIDTH), 
                                             class_mode="categorical",
                                             shuffle=True,  
                                             batch_size=1)

In [None]:
# get the metric names so we can use evaulate_generator
model.metrics_names

In [None]:
# Here the the last epoch will be used.
val_loss, val_cat_acc, val_top_2_acc, val_top_3_acc = \
model.evaluate_generator(test_datagen, 
                        steps=len(df_val))

print('val_loss:', val_loss)
print('val_cat_acc:', val_cat_acc)
print('val_top_2_acc:', val_top_2_acc)
print('val_top_3_acc:', val_top_3_acc)

In [None]:
# Here the best epoch will be used.

model.load_weights('model_mobilenetv2.h5')

val_loss, val_cat_acc, val_top_2_acc, val_top_3_acc = \
model.evaluate_generator(test_datagen, 
                        steps=len(df_val))

print('val_loss:', val_loss)
print('val_cat_acc:', val_cat_acc)
print('val_top_2_acc:', val_top_2_acc)
print('val_top_3_acc:', val_top_3_acc)

In [None]:
# display the curves

acc = history.history['categorical_accuracy']
val_acc = history.history['val_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
train_top2_acc = history.history['top_2_accuracy']
val_top2_acc = history.history['val_top_2_accuracy']
train_top3_acc = history.history['top_3_accuracy']
val_top3_acc = history.history['val_top_3_accuracy']
epochs = range(1, len(acc) + 1)

# curve loss
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.figure()

# curve cat accuracy
plt.plot(epochs, acc, 'r', label='Training cat acc')
plt.plot(epochs, val_acc, 'b', label='Validation cat acc')
plt.title('Training and validation cat accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.figure()

# curve top2 accuracy
plt.plot(epochs, train_top2_acc, 'r', label='Training top2 acc')
plt.plot(epochs, val_top2_acc, 'b', label='Validation top2 acc')
plt.title('Training and validation top2 accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.figure()

# curve top3 acc
plt.plot(epochs, train_top3_acc, 'r', label='Training top3 acc')
plt.plot(epochs, val_top3_acc, 'b', label='Validation top3 acc')
plt.title('Training and validation top3 accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

In [None]:
# Get the labels of the test images
test_labels = test_datagenerator.classes

In [None]:
# We need these to plot the confusion matrix.
test_labels

In [None]:
# Print the label associated with each class
test_datagen.class_indices

In [None]:
# make a prediction
predictions = model.predict_generator(test_datagen, steps=len(df_val), verbose=1)

In [None]:
predictions.shape

In [None]:
# Source: Scikit Learn website
# http://scikit-learn.org/stable/auto_examples/
# model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-
# selection-plot-confusion-matrix-py

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()

In [None]:
test_labels.shape

In [None]:
# argmax returns the index of the max value in a row
cm = confusion_matrix(test_labels, predictions.argmax(axis=1))

In [None]:
test_datagen.class_indices

In [None]:
# Define the labels of the class indices. These need to match the 
# order shown above.
cm_plot_labels = ['akiec', 'bcc', 'bkl', 'df', 'mel','nv', 'vasc']
plot_confusion_matrix(cm, cm_plot_labels, title='Confusion Matrix')

In [None]:
# Get the index of the class with the highest probability score
y_pred = np.argmax(predictions, axis=1)

# Get the labels of the test images.
y_true = test_datagen.classes

In [None]:
# Generate a classification report
report = classification_report(y_true, y_pred, target_names=cm_plot_labels)
print(report)

# SAVE MODEL


In [46]:
# save the model for future use
model.save("model_mobilenet_v2.h5")

# CONVERT MODEL


In [48]:
# convert to json
import json
json_string = model.to_json()
f = open("../content/model_mobilenet_v2.json", "w")
json.dump(json_string, f)

In [49]:
# convert the model to TFLite
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model_mobilenet_v2.tflite', 'wb') as f:
  f.write(tflite_model)

In [None]:
# create labels cancer
labels = '\n'.join(sorted(train_datagenerator.class_indices.keys()))
with open('labels.txt', 'w') as f:
  f.write(labels)

In [None]:
# download tflite and txt
files.download('model_mobilenetv2.tflite')
files.download('labels.txt')