<a href="https://colab.research.google.com/github/AkshatSG/Melanoma-Skin-Cancer-Classification-Using-Tensorflow-ConvNets-Inception-v3/blob/main/Melanoma_Cancer_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Melanoma Skin Cancer Prediction Using ConvNets**
#### Data from: https://www.kaggle.com/datasets/hasnainjaved/melanoma-skin-cancer-dataset-of-10000-images


**Brief Description:** Melanoma is a type of skin cancer that develops when melanocytes (the cells that give the skin its tan or brown color) start to grow out of control.

**Motivation for Project:** Melanoma is much less common than some other types of skin cancers. But melanoma is more dangerous because it’s much more likely to spread to other parts of the body if not caught and treated early.

***Source:** https://www.cancer.org/cancer/types/melanoma-skin-cancer/about/what-is-melanoma.html

## **Downloading the Data**

We are using data from Kaggle in this project, and thus, we will need to use the Kaggle API.

Remember to upload your **'kaggle.json'** file in the correct location before running the cell below.

In [None]:
!pip install kaggle
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download hasnainjaved/melanoma-skin-cancer-dataset-of-10000-images

Downloading melanoma-skin-cancer-dataset-of-10000-images.zip to /content
 92% 91.0M/98.7M [00:01<00:00, 59.1MB/s]
100% 98.7M/98.7M [00:01<00:00, 69.3MB/s]


Make sure zip file has been installed. If successful, then you can proceed to unzip the file using the following command. This should give you:


*   melanoma_cancer_dataset
 * test
     * benign
     * malignant
 * train
     * benign
     * malignant

Each of the **benign** and **malignant** directories contain images corresponding indicating the label.

**Benign**: False, not cancerous

**Malignant**: True, Melanoma is present

In [None]:
!unzip melanoma-skin-cancer-dataset-of-10000-images

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: melanoma_cancer_dataset/train/benign/melanoma_643.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_644.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_645.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_646.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_647.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_648.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_649.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_65.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_650.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_651.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_652.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_653.jpg  
  inflating: melanoma_cancer_dataset/train/benign/melanoma_654.jpg  
  inflating: melanoma_cancer_dataset/tr

##**Using Tensorflow Image Generators**

Since the hierarchy is suitable for using the Image Generator, we will point it at the train and test (validation) directories. The name of each sub-directory (benign & malignant) will be used as labels, and the images will be labelled and loaded accordingly.

In [None]:
import tensorflow as tf

from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory('melanoma_cancer_dataset/train', target_size=(300,300), batch_size=128, class_mode='binary')
val_datagen = ImageDataGenerator(rescale=1./255)
val_generator = val_datagen.flow_from_directory('melanoma_cancer_dataset/test', target_size=(300,300), batch_size=128, class_mode='binary')

Found 9605 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


##**Building the Model and Training**

We ensured that the size of the images flowing from the generators are of size 300x300, thus we pass the input shape as (300, 300, 3), where the 3 represents the color channels Red, Green, and Blue (RGB) in the image.

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300,300,3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(300,300,3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(), metrics=['accuracy'])

In [None]:
history = model.fit(train_generator, steps_per_epoch=8, epochs=15, validation_data=val_generator, validation_steps=8, verbose=1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [None]:
model.evaluate(val_generator)



[0.28417110443115234, 0.8859999775886536]

The results are decent, let's not try different methods to increase the accuracy further . . .

## **Image Augmentation**

We will still be using Tensorflow Image Generator, and when augmenting, the generator will simply amend images on the fly, using transformations like rotations to simulate images in different views to extend the dataset further.

In [None]:
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')
train_generator = train_datagen.flow_from_directory('melanoma_cancer_dataset/train', target_size=(300,300), batch_size=128, class_mode='binary')

val_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')
val_generator = val_datagen.flow_from_directory('melanoma_cancer_dataset/test', target_size=(300,300), batch_size=128, class_mode='binary')

Found 9605 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


We will build and run using the same architecture to see if augmentation was better or worse:

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300,300,3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(300,300,3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(), metrics=['accuracy'])

In [None]:
history = model.fit(train_generator, steps_per_epoch=8, epochs=15, validation_data=val_generator, validation_steps=8, verbose=1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [None]:
model.evaluate(val_generator)



[0.5279585123062134, 0.7229999899864197]

Clearly, this was much worse than the regular dataset. This was just to display evidence of how unsuccessful augmenting the data was, but here are some of the main reasons for it not working well:

**Data Scarcity/Abundance:** Augementing is quite useful when there is not enough data for predicting, however, we have a couple thousand samples, which is certainly not a minimal amount of data.

**Overfitting:** Augmenting is also utilized when the model is overfitting (train accuracy much higher than the validation set), so that the model can generalize to new and different images better. However, in our case, both accuracies where very close to each other.

**Skewed/Imbalanced Datasets:** If there is an imbalance of values for the classes, such that the number of samples in one class is much higher than that of another class, then augmenting can help increase the number of samples and create a balance between the classes. However, our data is not imbalanced and contains an almost equal proportion of data.

**Interpretability:** Cancer imaging data is mostly done in a specific manner, and augmentation can lead to distortions in the medical information or even create unrealistic features that can lead to misinterpretation, and a bad accuracy for the model too.

These are a few reasons as to why augmentation did not help with this data.

## **Transfer Learning**

~ Taking a model that is trained for a longer time on a lot more data, on a much more complicated architecture, and applying it to our data.

They possess convolutional layers that are intact with features that have already been learned.

In our case, we will be using the **Inception V3 Model**

More about Inception-v3: https://paperswithcode.com/method/inception-v3

In [None]:
import os
from tensorflow.keras import layers, Model

!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
    -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5

from tensorflow.keras.applications.inception_v3 import InceptionV3

# Create an instance of the inception model from the local pre-trained weights
local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

pretrained_model = InceptionV3(input_shape=(150, 150, 3), include_top=False, weights=None)
pretrained_model.load_weights(local_weights_file)

for layer in pretrained_model.layers:
  layer.trainable=False

--2023-11-06 21:33:30--  https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.207, 2607:f8b0:4023:c0d::cf, 2607:f8b0:4023:c03::cf, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 87910968 (84M) [application/x-hdf]
Saving to: ‘/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5’


2023-11-06 21:33:31 (89.6 MB/s) - ‘/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5’ saved [87910968/87910968]



<bound method Model.summary of <keras.src.engine.functional.Functional object at 0x7d0874b4d2d0>>

Warning: Huge Architecture!

In [None]:
pretrained_model.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, 150, 150, 3)]        0         []                            
                                                                                                  
 conv2d_6 (Conv2D)           (None, 74, 74, 32)           864       ['input_1[0][0]']             
                                                                                                  
 batch_normalization (Batch  (None, 74, 74, 32)           96        ['conv2d_6[0][0]']            
 Normalization)                                                                                   
                                                                                                  
 activation (Activation)     (None, 74, 74, 32)           0         ['batch_normalizati

We take the mixed7 layer's output and pass it to our own neural network, thus combining the two, for better performance, and making ensuring that it trains on our data well.

In [None]:
last_layer = pretrained_model.get_layer('mixed7')
last_output = last_layer.output

In [46]:
x = layers.GlobalAveragePooling2D()(last_output)
x = layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(1, activation='sigmoid')(x)

model = Model(pretrained_model.input, x)
model.compile(optimizer=tf.keras.optimizers.Adam(), loss='binary_crossentropy', metrics=['accuracy'])

#Reinstantiating the initial generators that were later changed for augmentation
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory('melanoma_cancer_dataset/train', target_size=(300,300), batch_size=128, class_mode='binary')
val_datagen = ImageDataGenerator(rescale=1./255)
val_generator = val_datagen.flow_from_directory('melanoma_cancer_dataset/test', target_size=(300,300), batch_size=128, class_mode='binary')

Found 9605 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


In [47]:
history = model.fit(train_generator, steps_per_epoch=8, epochs=15, validation_data=val_generator, validation_steps=8, verbose=1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [48]:
model.evaluate(val_generator)



[0.22437991201877594, 0.9039999842643738]

Much better results! Inception combined with our GlobalAveragePooling2D and our Dense layers provides much better results on real-world test data.

In [49]:
model.save("melanoma_classifier.h5")

  saving_api.save_model(


#**Conclusion**

We have successfully created a deep learning neural network partly using the Inception-v3 architecture that has been shown to predict more than 90% of real-world test data of Melanoma Skin Cancer Imaging, which is a significant accomplishment, since it is quite difficult to characterize miniscule features and predict whether a given tumor is malignant or benign.

Once again, Melanoma is a deadly skin cancer, and its early detection and cure can save many lives. This, though a small-scale project, is a significant stepping stone, and continued research can help save the lives of many.

**Disclaimer**: This melanoma prediction model, while achieving high accuracy, should not be relied upon for real-time medical diagnosis or decision-making. Always consult with a qualified healthcare professional for a comprehensive evaluation and diagnosis of any skin condition. AI models are a tool to aid medical professionals but should not replace their expertise and clinical judgment.