1. Pretrained Model Choice

- Select a pretrained CNN model such as VGG16, MobileNetV2 or ResNet50.
- Load the model with ImageNet weights.
- Freeze the base layers to use it as a feature extractor.

*Pretrained Model Choice*

MobileNetV2 is used as the pretrained base model because it is lightweight,
fast to train and commonly used for medical image classification. 
The base layers will be frozen first to use the network as a feature extractor.

2. Data Generators
- Use the same data generators as the baseline model.
- Make sure the image size matches the pretrained model requirements.

In [1]:
# Step 1: Imports and Load Generators

import os
import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import Adam

train_dir = 'E:/CY Tech/Big Data project/Project 2 DermNet Skin Disease Classification/Data/Raw/train'
test_dir = 'E:/CY Tech/Big Data project/Project 2 DermNet Skin Disease Classification/Data/Raw/test'

img_height = 224
img_width = 224
batch_size = 32

# Use augmentation for training (pretrained models need it)
train_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    validation_split=0.2,
    rotation_range=20,
    zoom_range=0.2,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

# Training subset
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='training'
)

# Validation subset
val_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    subset='validation'
)

# Test generator
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False
)

num_classes = train_generator.num_classes
print("Number of classes:", num_classes)

Found 12453 images belonging to 23 classes.
Found 3104 images belonging to 23 classes.
Found 4002 images belonging to 23 classes.
Number of classes: 23


3. Model Architecture

- Add custom layers on top of the pretrained base:
  - GlobalAveragePooling2D
  - Dense layers
  - Final softmax layer matching number of classes

In [2]:
# Step 2 :Build the Pretrained Model Architecture

# Load the pretrained MobileNetV2 base
base_model = MobileNetV2(
    input_shape=(img_height, img_width, 3),
    include_top=False,
    weights='imagenet'
)

# Freeze base layers (important for transfer learning)
for layer in base_model.layers:
    layer.trainable = False

# Build the classifier on top
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes, activation='softmax')
])

model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


4. Training

- Train only the top layers first.
- Optionally unfreeze some deeper layers for fine tuning.
- Track accuracy and loss during training.

In [3]:
# Step 3: Training (Top Layers Only)

model.compile(
    optimizer=Adam(learning_rate=0.0005),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

epochs = 5  # safe and not too long

history_pretrained = model.fit(
    train_generator,
    epochs=epochs,
    validation_data=val_generator
)

Epoch 1/5
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m239s[0m 604ms/step - accuracy: 0.2533 - loss: 2.5377 - val_accuracy: 0.2220 - val_loss: 2.6825
Epoch 2/5
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m232s[0m 594ms/step - accuracy: 0.3239 - loss: 2.2635 - val_accuracy: 0.2310 - val_loss: 2.6580
Epoch 3/5
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m230s[0m 591ms/step - accuracy: 0.3573 - loss: 2.1298 - val_accuracy: 0.2465 - val_loss: 2.6208
Epoch 4/5
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m232s[0m 594ms/step - accuracy: 0.3938 - loss: 2.0326 - val_accuracy: 0.2497 - val_loss: 2.6340
Epoch 5/5
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m230s[0m 589ms/step - accuracy: 0.4035 - loss: 1.9682 - val_accuracy: 0.2490 - val_loss: 2.6503


5. Evaluation

- Evaluate the pretrained model on the test set.
- Compare performance with the baseline CNN.
- Generate accuracy, confusion matrix and classification report.

In [4]:
# Step 5 :evaluate Pretrained Model on Test Set

from sklearn.metrics import confusion_matrix, classification_report
import numpy as np

# evaluate test accuracy
test_loss, test_acc = model.evaluate(test_generator)
print("Test accuracy (pretrained model):", test_acc)

# get predictions
pred_probs = model.predict(test_generator)
pred_classes = np.argmax(pred_probs, axis=1)

# true labels
true_classes = test_generator.classes
class_labels = list(test_generator.class_indices.keys())

# confusion matrix
cm = confusion_matrix(true_classes, pred_classes)

print("Confusion Matrix:")
print(cm)

# classification report
print("Classification Report:")
print(classification_report(true_classes, pred_classes, target_names=class_labels))

[1m126/126[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 234ms/step - accuracy: 0.3476 - loss: 2.2470
Test accuracy (pretrained model): 0.34757620096206665
[1m126/126[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 241ms/step
Confusion Matrix:
[[200   8   3   0   0   5   2   5   7   1   0   3   0   2  11   0  36   1
   18   1   0   0   9]
 [ 26  68   5   1   1   9   1   1   7   0   1   5   1   0  15   0 106   9
   12   1   1   4  14]
 [ 14   1  28   0   0  18   7   2   1   2   1   0   0   0   7   0   7   8
    9   3   3   8   4]
 [  8  14   3   3   0   6   0   2   6   0   2   1   1   0  18   0  14   8
    9   1   1   6  10]
 [ 11   2   3   0   2   0   5   2   3   0   2   0   1   0   2   0  11   6
   12   1   0   4   6]
 [ 14   6   4   2   0 129   3   3   5   2   0   2   7   0  23   1  30   8
   42   0   1  17  10]
 [  3   0   6   2   0   7  32   1   2   1   1   0   0   0  15   0   3   6
   10   2   0   6   4]
 [  3   0   1   0   0   0   1  32   1   0   0   0   0   0  

6. Comparison With Baseline

- Summarize how much improvement the pretrained model brings.
- Explain why it performs better or worse.

- The pretrained MobileNetV2 model performed better than the baseline CNN. 
- The baseline CNN was simple and had limited capacity, so it struggled to pick up 
the complex texture patterns that appear in medical skin images. 
- MobileNetV2 already learned rich features from ImageNet, so even with only the top 
layers trained, it was able to generalize better.

- The pretrained model improved the overall accuracy and gave stronger class-level 
precision and recall. It also handled the class imbalance better because its 
feature extractor is stronger. The baseline model underfit, while the pretrained 
one showed more stable validation performance.


7. Final Model Selection

- Choose the best performing model.
- Save it for later use.

- The pretrained MobileNetV2 model was selected as the final model because it gave
better performance than the baseline CNN. It reached higher accuracy on the test set
and produced stronger class level scores in the classification report. The model 
generalized better thanks to the pretrained feature extractor and the data augmentation.

- This model is saved for later use so it can be reused in the evaluation notebook or 
in a small application if needed.

In [5]:
#save final model
model.save('E:/CY Tech/Big Data project/Project 2 DermNet Skin Disease Classification/Models/final_mobilenet_model.h5')
print("Model saved successfully.")



Model saved successfully.


In [6]:
model.save('E:/CY Tech/Big Data project/Project 2 DermNet Skin Disease Classification/Models/final_mobilenet_model.keras')
print("Model saved successfully in Keras format.")

Model saved successfully in Keras format.
