# **Hybrid Model for Skin Cancer Analysis**


PHASE 1:
1. Download Data
2. Organize data into train and test catagories
3. Process images
4. Load data into pytorch for CNN Training



**Step 1**:
Downloading data sets and extracting data



Data were downloaded through Kaggle where they have already separate and categorized each disease, here are the links:

**Malignant and Benign:**

https://www.kaggle.com/code/fanconic/starter-skin-cancer-malignant-vs-benign

**9 classes of skin cancer:**

https://www.kaggle.com/datasets/nodoubttome/skin-cancer9-classesisic

**Ham10000:**

https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000?select=HAM10000_images_part_1



**Dataset and Purpose in the Project**

HAM10000
*   Multi-class classification, progression modeling

Skin Cancer ISIC	     
*   Multi-class classification

Malignant vs. Benign  
*   	Binary classification (use for CNN first)

PAD-UFES-20
*   Real-world clinical images for validation

Lesion Segmentation
*   Optional, for improving CNN feature extraction



In [None]:
from google.colab import drive
drive.mount('/content/drive')

PROJECT_PATH = "/content/drive/My Drive/BME230A_Project"


Mounted at /content/drive


Loading data into TensorFlow

In [None]:
import os

TRAIN_DIR = "/content/drive/My Drive/BME230A_Project/Dataset/Train"
TEST_DIR = "/content/drive/My Drive/BME230A_Project/Dataset/Test"

def list_detected_classes(directory):
    class_folders = sorted([folder for folder in os.listdir(directory) if os.path.isdir(os.path.join(directory, folder))])
    print(f"📂 {directory} contains {len(class_folders)} classes:")
    print(class_folders)

print("✅ Checking Detected Train Classes:")
list_detected_classes(TRAIN_DIR)

print("\n✅ Checking Detected Test Classes:")
list_detected_classes(TEST_DIR)

def count_images_per_class(directory):
    print(f"📂 Checking images in: {directory}\n")
    for class_folder in sorted(os.listdir(directory)):  # Sort for cleaner output
        class_path = os.path.join(directory, class_folder)
        if os.path.isdir(class_path):  # Only check folders (ignore hidden files)
            num_images = len([img for img in os.listdir(class_path) if img.endswith(".jpg")])
            print(f"📂 {class_folder}: {num_images} images")

print("✅ Checking Training Data:")
count_images_per_class(TRAIN_DIR)

print("\n✅ Checking Test Data:")
count_images_per_class(TEST_DIR)

✅ Checking Detected Train Classes:
📂 /content/drive/My Drive/BME230A_Project/Dataset/Train contains 9 classes:
['actinic keratosis', 'basal cell carcinoma', 'dermatofibroma', 'melanoma', 'nevus', 'pigmented benign keratosis', 'seborrheic keratosis', 'squamous cell carcinoma', 'vascular lesion']

✅ Checking Detected Test Classes:
📂 /content/drive/My Drive/BME230A_Project/Dataset/Test contains 9 classes:
['actinic keratosis', 'basal cell carcinoma', 'dermatofibroma', 'melanoma', 'nevus', 'pigmented benign keratosis', 'seborrheic keratosis', 'squamous cell carcinoma', 'vascular lesion']
✅ Checking Training Data:
📂 Checking images in: /content/drive/My Drive/BME230A_Project/Dataset/Train

📂 actinic keratosis: 304 images
📂 basal cell carcinoma: 492 images
📂 dermatofibroma: 113 images
📂 melanoma: 955 images
📂 nevus: 5633 images
📂 pigmented benign keratosis: 994 images
📂 seborrheic keratosis: 77 images
📂 squamous cell carcinoma: 181 images
📂 vascular lesion: 141 images

✅ Checking Test Data:


In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define ImageDataGenerator for data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,          # Normalize pixel values
    rotation_range=20,       # Rotate images randomly
    width_shift_range=0.2,   # Random horizontal shift
    height_shift_range=0.2,  # Random vertical shift
    shear_range=0.2,         # Shear transformation
    zoom_range=0.2,          # Zoom augmentation
    horizontal_flip=True,    # Flip images horizontally
    fill_mode='nearest'
)

test_datagen = ImageDataGenerator(rescale=1./255)  # Only rescale for test

# Load the images from Train and Test folders
train_generator = train_datagen.flow_from_directory(
    TRAIN_DIR,
    target_size=(224, 224),   # Resize images to 224x224 for CNN
    batch_size=32,
    class_mode='categorical'  # Multi-class classification
)

test_generator = test_datagen.flow_from_directory(
    TEST_DIR,
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

print("✅ Data successfully loaded into TensorFlow ImageDataGenerators!")

Found 8890 images belonging to 9 classes.
Found 1647 images belonging to 9 classes.
✅ Data successfully loaded into TensorFlow ImageDataGenerators!


Define the CNN Model (Using ResNet50)

In [None]:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# Load Pretrained ResNet50 (without top layers)
base_model = ResNet50(weights="imagenet", include_top=False, input_shape=(224, 224, 3))

# Freeze the base model layers
for layer in base_model.layers:
    layer.trainable = False

# Add custom classification layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation="relu")(x)
x = Dropout(0.3)(x)  # Regularization
x = Dense(256, activation="relu")(x)
x = Dropout(0.3)(x)
predictions = Dense(train_generator.num_classes, activation="softmax")(x)  # Output layer

# Create the final model
model = Model(inputs=base_model.input, outputs=predictions)

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.0001), loss="categorical_crossentropy", metrics=["accuracy"])

# Model summary
model.summary()
print("✅ CNN Model (ResNet50) is ready for training!")

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


✅ CNN Model (ResNet50) is ready for training!


In [None]:
# Set TensorFlow to only use GPU
tf.config.set_visible_devices(tf.config.list_physical_devices('GPU')[0], 'GPU')

# Ensure TensorFlow runs on GPU
print("🔧 Using GPU:", tf.test.is_gpu_available())

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.


🔧 Using GPU: True


Train CNN Model now

In [None]:
# Train the model
history = model.fit(
    train_generator,
    epochs=10,  # You can increase this if needed
    validation_data=test_generator
)

print("🎉 Model training completed!")

  self._warn_if_super_not_called()


Epoch 1/10
[1m556/556[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2785s[0m 5s/step - accuracy: 0.5995 - loss: 1.4443 - val_accuracy: 0.6509 - val_loss: 1.2280
Epoch 2/10
[1m556/556[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2215s[0m 4s/step - accuracy: 0.6368 - loss: 1.3174 - val_accuracy: 0.6509 - val_loss: 1.2298
Epoch 3/10
[1m556/556[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2267s[0m 4s/step - accuracy: 0.6272 - loss: 1.3336 - val_accuracy: 0.6509 - val_loss: 1.2217
Epoch 4/10
[1m512/556[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m2:28[0m 3s/step - accuracy: 0.6337 - loss: 1.3169