First of all, let’s define all the necessary libraries and let’s see what kind of data we’re dealing with.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pandas as pd
import seaborn as sns
import numpy as np

print(tf.__version__)

In [None]:
# set the directory path

base_dir = '/kaggle/input/skin-cancer-malignant-vs-benign'
train_dir = base_dir + '/train'
test_dir = base_dir +'/test'

%matplotlib inline
image = mpimg.imread(test_dir+'/benign/1003.jpg')
plt.imshow(image)
plt.show()

image2 = mpimg.imread(test_dir+'/malignant/1007.jpg')
plt.imshow(image2)
plt.show()


Data is approximately 180MB in size. It might become a problem to feed the model with this amount of data, so we will be using the ImageDataGenerator.
A Python generator is a neat way of passing a tuple of data to a model. Generator functions are a special kind of function that returns a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.


Data augmentation is a set of techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It acts as a regularizer and helps reduce overfitting when training a machine learning model. For full documentation, check the ImageDataGenerator docs on TensorFlow. There is no need to augment test data, since they serve as an evaluation metrics.


In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_gen = ImageDataGenerator(rescale=1/255, rotation_range=30, shear_range=0.2, zoom_range=0.3, height_shift_range=0.4)
test_gen = ImageDataGenerator(rescale=1/255)

train_dataset = train_gen.flow_from_directory(train_dir, target_size=(150,150),class_mode='binary', batch_size=64)
test_dataset = test_gen.flow_from_directory(test_dir, target_size=(150,150),class_mode='binary', batch_size=64)

Since we set the target size to be 150x150 in the generator, we can easily define it in the first layer of our network.


The model is comprised of three convolutional layers each followed by a pooling layer, a Flatten layer, and three dense layers, the last one being a sigmoid gate. The rest is history.


In [None]:
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten

model = tf.keras.Sequential([
    Conv2D(64,(3,3), input_shape=(150,150,3), activation='relu', padding='SAME'),
    MaxPooling2D((2,2)),
    Conv2D(32,(3,3), activation='relu', padding='SAME'),
    MaxPooling2D((2,2)),
    Conv2D(16,(3,3), activation='relu', padding='SAME'),
    MaxPooling2D((2,2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(32, activation='relu'),
    Dense(1,activation='sigmoid')
])

opt = tf.keras.optimizers.Adam(learning_rate=0.005)
model.compile(optimizer = opt, loss='binary_crossentropy', metrics=['accuracy'])

model.summary()


Let’s take a look at what our model predicts for the pictures we plotted earlier. I have defined a simple function to do that.


In [None]:
def predict(image):
    np_image = img_to_array(image)
    np_image = np.expand_dims(np_image, axis=0)
    return model.predict(np_image)