# Baseline
Small-scale training and debugging on 1000 image ISIC dev set. Here we ensure that we can:
1. Load images properly building a custom DataLoader class
2. Preprocess images
3. Load the EfficientNetB3 model
4. Train the EfficientNetB3 model using our pre-processed images
5. Assess accuracy on a validation set
6. Visualize correct and incorrect classifications



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import sys
project_root = "/content/drive/My Drive/midas"
if project_root not in sys.path:
  sys.path.append(project_root)

In [None]:
!touch "/content/drive/My Drive/midas/utils/__init__.py"

In [None]:
# load pacakges
import os
import sys
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, models
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.utils.class_weight import compute_class_weight
import numpy as np
from tqdm import tqdm
import traceback

# load my scripts
# project_root = "/content/drive/My Drive/midas"
# sys.path.append(project_root)
from utils.dataloader import ISICDataset
from utils.preprocess import get_image_transform, load_and_split
from models.baseline_model import BaselineModel

## Pre-process + Data Loaders


### Considerations to Revisit:

1. Resizing into 300x300 input required for EfficientNetB3 will distort aspect ratio of non-square images. Empirically, this is typically not an issue with deeper models, because augmentaiton, normalization, and large training sets will compensate. This becomes an issue when distinguishing shapes is important for classification, models are shallow, and your dataset size is limited. TODO: Considering working to preserve aspect ratio for MIDAS. We can leave ISIC as is given its dramatically larger number of samples.

2. Image-only training is a common default starting point. TODO: try to incorporate metadata to see if it helps predictions.


In [None]:
# required global params
N_SAMPLES = 1000
USE_METADATA = False

In [None]:
# Paths
base_dir = "/content/drive/My Drive/midas/data"
labels_path = os.path.join(base_dir, "ISIC_2019_Training_GroundTruth.csv")
meta_path = os.path.join(base_dir, "ISIC_2019_Training_Metadata.csv")
img_dir = os.path.join(base_dir, f"sample_{N_SAMPLES}")

# Transform
transform = get_image_transform()

# Load merged metadata + labels
train_df, val_df, index_to_label = load_and_split(labels_path, meta_path, n_samples=N_SAMPLES)
print('class distribution for train: ', train_df['label'].value_counts(normalize=True))

# Dataset + loaders
train_dataset = ISICDataset(img_dir, train_df, transform, index_to_label, use_metadata=False)
val_dataset = ISICDataset(img_dir, val_df, transform, index_to_label, use_metadata=False)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)


Train split: 800 images, 764 unique lesions
Val split:   200 images, 192 unique lesions
class distribution for train:  label
1    0.48250
0    0.16750
2    0.15875
4    0.11375
3    0.04250
7    0.02000
6    0.00875
5    0.00625
Name: proportion, dtype: float64


## Load EfficientNet-B3

Consideration: We have decided to train on all 8 ISIC classes even though our stated primary classifciation task of interest is malignant vs non-malignant, because multi-class trianing builds more semantically meaningful representations in early layers. Afterward, we will aggregate for eval to binary (malignant vs. not)

## Baseline Model
Now let's train on 1000 ISIC images


In [None]:
from src.trainer import train_model
from models.baseline_model import BaselineModel
from pathlib import Path

results_root = Path("/content/drive/MyDrive/midas/results")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize model
model = BaselineModel(num_classes=8)

def baseline_forward(model, batch):
    images, labels = batch
    images, labels = images.to(device), labels.long().to(device)
    return model(images)

# Setup optimizer and scheduler
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', patience=2, factor=0.5
)

# Run training
train_model(
    model, train_loader, val_loader, train_df, index_to_label,
    forward_fn=baseline_forward,
    optimizer=optimizer,
    scheduler=scheduler,
    num_epochs=5,
    experiment_name="baseline_lr-1e-4_lr-scheduler",
    results_root=results_root
)


CUDA available: True
Current device: 0


Train: 100%|██████████| 50/50 [11:21<00:00, 13.62s/it]
Val: 100%|██████████| 13/13 [02:46<00:00, 12.78s/it]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Epoch 1: Train Loss = 2.0138
Epoch 1: Val Loss = 1.9661, Val Acc = 0.5450
Classification Report:
              precision    recall  f1-score   support

         MEL       0.42      0.28      0.34        39
          NV       0.70      0.83      0.76        99
         BCC       0.60      0.39      0.47        23
          AK       0.13      0.40      0.20        10
         BKL       0.27      0.16      0.20        19
          DF       0.00      0.00      0.00         2
        VASC       0.00      0.00      0.00         4
         SCC       0.00      0.00      0.00         4

    accuracy                           0.55       200
   macro avg       0.27      0.26      0.25       200
weighted avg       0.53      0.55      0.53       200

Confusion Matrix:
[[11 17  2  8  1  0  0  0]
 [ 6 82  2  6  3  0  0  0]
 [ 2  5  9  7  0  0  0  0]
 [ 1  1  1  4  3  0  0  0]
 [ 5  7  1  3  3  0  0  0]
 [ 1  1  0  0  0  0  0  0]
 [ 0  3  0  1  0  0  0  0]
 [ 0  1  0  2  1  0  0  0]]

CUDA available:

Train: 100%|██████████| 50/50 [00:28<00:00,  1.74it/s]
Val: 100%|██████████| 13/13 [00:04<00:00,  3.07it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Epoch 2: Train Loss = 1.7688
Epoch 2: Val Loss = 1.8262, Val Acc = 0.5800
Classification Report:
              precision    recall  f1-score   support

         MEL       0.45      0.38      0.42        39
          NV       0.74      0.83      0.78        99
         BCC       0.42      0.65      0.51        23
          AK       0.13      0.20      0.16        10
         BKL       0.40      0.11      0.17        19
          DF       0.00      0.00      0.00         2
        VASC       0.00      0.00      0.00         4
         SCC       0.00      0.00      0.00         4

    accuracy                           0.58       200
   macro avg       0.27      0.27      0.25       200
weighted avg       0.55      0.58      0.55       200

Confusion Matrix:
[[15 17  4  3  0  0  0  0]
 [ 8 82  4  4  1  0  0  0]
 [ 2  3 15  3  0  0  0  0]
 [ 0  0  7  2  1  0  0  0]
 [ 7  4  5  1  2  0  0  0]
 [ 1  1  0  0  0  0  0  0]
 [ 0  3  0  1  0  0  0  0]
 [ 0  1  1  1  1  0  0  0]]

CUDA available:

Train: 100%|██████████| 50/50 [00:29<00:00,  1.70it/s]
Val: 100%|██████████| 13/13 [00:04<00:00,  2.97it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Epoch 3: Train Loss = 1.5408
Epoch 3: Val Loss = 1.7081, Val Acc = 0.6100
Classification Report:
              precision    recall  f1-score   support

         MEL       0.48      0.59      0.53        39
          NV       0.80      0.78      0.79        99
         BCC       0.44      0.70      0.54        23
          AK       0.20      0.20      0.20        10
         BKL       0.33      0.16      0.21        19
          DF       0.00      0.00      0.00         2
        VASC       1.00      0.25      0.40         4
         SCC       0.00      0.00      0.00         4

    accuracy                           0.61       200
   macro avg       0.41      0.33      0.33       200
weighted avg       0.60      0.61      0.59       200

Confusion Matrix:
[[23  8  4  3  1  0  0  0]
 [15 77  4  2  1  0  0  0]
 [ 1  3 16  2  1  0  0  0]
 [ 0  0  6  2  2  0  0  0]
 [ 7  4  5  0  3  0  0  0]
 [ 1  1  0  0  0  0  0  0]
 [ 0  3  0  0  0  0  1  0]
 [ 1  0  1  1  1  0  0  0]]

CUDA available:

Train: 100%|██████████| 50/50 [00:29<00:00,  1.69it/s]
Val: 100%|██████████| 13/13 [00:05<00:00,  2.59it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Epoch 4: Train Loss = 1.2606
Epoch 4: Val Loss = 1.6325, Val Acc = 0.6300
Classification Report:
              precision    recall  f1-score   support

         MEL       0.51      0.49      0.50        39
          NV       0.78      0.79      0.78        99
         BCC       0.45      0.83      0.58        23
          AK       0.33      0.30      0.32        10
         BKL       0.57      0.21      0.31        19
          DF       0.00      0.00      0.00         2
        VASC       0.75      0.75      0.75         4
         SCC       0.00      0.00      0.00         4

    accuracy                           0.63       200
   macro avg       0.43      0.42      0.41       200
weighted avg       0.62      0.63      0.61       200

Confusion Matrix:
[[19 12  4  3  1  0  0  0]
 [11 78  7  1  1  0  1  0]
 [ 1  2 19  1  0  0  0  0]
 [ 0  0  6  3  1  0  0  0]
 [ 6  5  4  0  4  0  0  0]
 [ 0  1  0  0  0  0  0  1]
 [ 0  1  0  0  0  0  3  0]
 [ 0  1  2  1  0  0  0  0]]

CUDA available:

Train: 100%|██████████| 50/50 [00:28<00:00,  1.73it/s]
Val: 100%|██████████| 13/13 [00:04<00:00,  3.09it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Epoch 5: Train Loss = 0.9982
Epoch 5: Val Loss = 1.5392, Val Acc = 0.6450
Classification Report:
              precision    recall  f1-score   support

         MEL       0.56      0.62      0.59        39
          NV       0.80      0.75      0.77        99
         BCC       0.55      0.74      0.63        23
          AK       0.46      0.60      0.52        10
         BKL       0.33      0.26      0.29        19
          DF       0.00      0.00      0.00         2
        VASC       1.00      0.50      0.67         4
         SCC       0.25      0.25      0.25         4

    accuracy                           0.65       200
   macro avg       0.49      0.46      0.47       200
weighted avg       0.65      0.65      0.64       200

Confusion Matrix:
[[24  9  3  2  1  0  0  0]
 [12 74  7  0  6  0  0  0]
 [ 2  1 17  2  1  0  0  0]
 [ 0  0  1  6  1  0  0  2]
 [ 5  5  3  1  5  0  0  0]
 [ 0  1  0  0  0  0  0  1]
 [ 0  2  0  0  0  0  2  0]
 [ 0  0  0  2  1  0  0  1]]

Training comple

([2.0137988901138306,
  1.7687723302841187,
  1.5407856178283692,
  1.2605955278873444,
  0.9982059621810913],
 [1.9661118078231812,
  1.8261774826049804,
  1.708098521232605,
  1.6325401496887206,
  1.5392481946945191],
 'MyDrive/midas/results/baseline_lr-1e-4_lr-scheduler_20250604_040009/best_model.pt')

### Visualize Correctly Classified vs. Incorrectly Classified Images

In [None]:
# TODO: where'd code go for getting examples? fix that here or visualize elsewhere

import matplotlib.pyplot as plt
import torchvision.transforms.functional as F
import torch

# Unnormalization transform
unnormalize = transforms.Normalize(
    mean=[-m / s for m, s in zip([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])],
    std=[1 / s for s in [0.229, 0.224, 0.225]]
)

ISIC_CLASSES = [
    "melanoma", "melanocytic nevus", "BCC", "actinic keratosis",
    "benign keratosis", "dermatofibroma", "vascular lesion", "SCC", "none"
]

def plot_examples(examples, correct=True, n=10):
    filtered = [ex for ex in examples if (ex[1] == ex[2]) == correct]
    selected = filtered[:n]

    fig, axs = plt.subplots(2, 5, figsize=(15, 6))
    axs = axs.flatten()

    for ax, (img, pred, true) in zip(axs, selected):
        img = unnormalize(img)
        img = torch.clamp(img, 0, 1)
        img = F.to_pil_image(img)

        ax.imshow(img)
        ax.set_title(f"Pred: {ISIC_CLASSES[pred]}\nTrue: {ISIC_CLASSES[true]}")
        ax.axis('off')

    for i in range(len(selected), len(axs)):
        axs[i].axis('off')  # hide empty subplots

    plt.tight_layout()
    plt.show()

print("Correctly classified examples:")
plot_examples(examples, correct=True)

print("Misclassified examples:")
plot_examples(examples, correct=False)

Output hidden; open in https://colab.research.google.com to view.

# Development Summary: EfficientNet-B3 on ISIC 2019 (Dev Set)

**Author:** Sophia Longo  
**Project:** CS231N Final Project  
**Last updated:** May 2025

---

## 🔍 Objective

To validate the image loading, preprocessing, and training pipeline for skin lesion classification using EfficientNet-B3 on a small ISIC 2019 development subset (n = 100). This serves as the foundation for scaling to full ISIC and later fine-tuning with MIDAS.

---

## ✅ Tasks Completed

### 1. Image Loading

- Built a custom `ISICDataset` class (in `utils/data.py`)
- Uses `ISIC_2019_Training_GroundTruth.csv` to associate image filenames with labels
- Reads from `sample_100/` subset created for fast iteration

### 2. Preprocessing

- Applied torchvision transforms:
  - Resize to **300×300**
  - Normalize with **ImageNet mean/std**
- Verified shape and channel consistency

### 3. Model Setup

- Loaded pretrained `efficientnet_b3` from `torchvision.models`
- Replaced classifier head with `Linear(in_features, 9)` for 9-class ISIC classification

### 4. Training Loop

- Used `CrossEntropyLoss` and `Adam` optimizer
- Ran 1–2 batches per epoch for dev-scale testing
- Successfully reduced loss and updated weights

### 5. Validation Accuracy

- Stratified 80/20 train/val split on the dev subset
- Evaluated **multiclass accuracy** on 20 validation images

### 6. Visualization

- Plotted correctly and incorrectly classified images
- Used `matplotlib` 2×5 grid layout
- Applied unnormalization to restore color
- Displayed predicted and true class names from ISIC

---

## 📁 Files Used

- `data/sample_100/`: Subset of 100 ISIC JPEGs
- `ISIC_2019_Training_GroundTruth.csv`: Used to build dataset
- `dev.ipynb`: Main notebook for this phase

---

## ⏭️ Next Steps

- Train EfficientNet-B3 on a larger ISIC subset (e.g., 5k images)
- Experiment with:
  - Metadata fusion
  - Classifier head modifications
  - Binary vs. multiclass outputs
- Fine-tune on MIDAS data for core evaluation

---