<a href="https://colab.research.google.com/github/AliciaFalconCaro/MedicalImageClassificationExample/blob/main/MedicalimageClassificationCNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We will perform some basic medical image classification techniques.
The purpose is to compare traditional techniques with DL techniques.

For this mini-project, we will use the public dataset available here: https://www.kaggle.com/datasets/nodoubttome/skin-cancer9-classesisic/data

The dataset is already separated in two folders: train and test. It contains multiple images from different skin cancers. In total, there are 6 classes of cancer in the dataset.

Let's start with loading the data and having a quick look at it

In [None]:
import tensorflow as tf

# Define paths
train_dir = './Skin_Cancer_ISIC_Data/train'
test_dir = './Skin_Cancer_ISIC_Data/test'

# Load training and testing datasets
imageSize = (64, 64)  # Resize to 64x64

train_dataset = tf.keras.utils.image_dataset_from_directory(train_dir,
    labels='inferred',  # Automatically infer labels from folder names
    label_mode='int',   # Labels will be integers
    image_size=imageSize,
    batch_size=32,      # Batch size for training
    shuffle=True        # Shuffle the dataset
)

test_dataset = tf.keras.utils.image_dataset_from_directory(test_dir,labels='inferred',label_mode='int',image_size=imageSize,batch_size=32,
    shuffle=False       # No need to shuffle test data
)

# Normalize pixel values to [0, 1]
normalization_layer = tf.keras.layers.Rescaling(1./255)
train_dataset = train_dataset.map(lambda x, y: (normalization_layer(x), y))
test_dataset = test_dataset.map(lambda x, y: (normalization_layer(x), y))

#since we are using DL with CNN, no more propocessing steps are necessary. However, they could improve the accuracy/performance of the model
#we could use data augmentation or feature extraction techniques


In [None]:
# Split train dataset into train and validation
val_dataset = train_dataset.take(20)  # Take 20 batches for validation
train_dataset = train_dataset.skip(20)  # Rest is training data

# Autotune and prefetch to improve GPU performance (similar to parallel processing)
AUTOTUNE = tf.data.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
val_dataset = val_dataset.prefetch(buffer_size=AUTOTUNE)
test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)


# View class names
class_names = train_dataset.class_names
print("Class names:", class_names)

In [None]:
# Define CNN model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(len(class_names), activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',  # Use sparse since labels are integers
              metrics=['accuracy'])

# View model summary
model.summary()


In [None]:
# Train the model
history = model.fit(train_dataset,validation_data=val_dataset, epochs=10)

In [None]:
#save trained model
model.save('model.h5') #can also be saved as tensorflow format instead

In [None]:
# Evaluate model on test set
test_loss, test_accuracy = model.evaluate(test_dataset)
print(f"Test Accuracy: {test_accuracy:.2f}")


In [None]:
import matplotlib.pyplot as plt

# Plot accuracy
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training and Validation Accuracy')
plt.show()

# Plot loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()


We visualise the results of the test data with a confusion matrix.

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import numpy as np

# Get predictions on the test set
y_pred = model.predict(test_dataset)
y_true = np.array([y for _, y in test_dataset])

# Convert to class labels
y_pred_classes = np.argmax(y_pred, axis=1)
y_true_classes = np.argmax(y_true, axis=1)

# Compute confusion matrix
cm = confusion_matrix(y_true_classes, y_pred_classes)

# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show()



If we wanted a more detailed evaluation of the model performance, we could obtain and print the classification report:

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_true_classes, y_pred_classes, target_names=class_names))
