In [None]:
# Multimodal Skin Cancer Classification (Image + Metadata)

This notebook builds and trains a multimodal deep learning model that uses both **skin lesion images** and **patient metadata** (age, sex, lesion location) to classify skin cancer. This approach is based on recent research showing that combining data sources can significantly improve accuracy compared to using images alone.

**The process is as follows:**
1.  **Load and Preprocess Data**: Load images and the `HAM10000_metadata.csv` file. Clean and transform the metadata into a numerical format.
2.  **Create a Custom Dataset**: Build a PyTorch `Dataset` that provides an image, its corresponding metadata, and the label for each sample.
3.  **Define the Multimodal Model**: Construct a model with two branches:
    *   An **Image Branch** (a pre-trained CNN like EfficientNet) to learn from pixels.
    *   A **Metadata Branch** (a simple MLP) to learn from tabular data.
    *   A **Fusion Layer** that combines the outputs of both branches before making a final classification.
4.  **Train the Model**: Implement a training loop to train the model on the combined dataset.
5.  **Evaluate Performance**: Assess the model's accuracy on a held-out test set.
</VSCode.Cell>
<VSCode.Cell language="python">
import os
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, models
from PIL import Image
import timm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from collections import Counter
import matplotlib.pyplot as plt
from torch.cuda.amp import GradScaler, autocast
</VSCode.Cell>
<VSCode.Cell language="markdown">
## 1. Configuration
Set up the main parameters for the training process.
</VSCode.Cell>
<VSCode.Cell language="python">
# --- Configuration ---
# Device (GPU/CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

if device.type == 'cpu':
    print("Warning: Running on CPU. Training will be very slow. It is highly recommended to use a GPU.")

# Paths
IMG_DIR = "HAM10000_images" 
METADATA_PATH = "HAM10000/HAM10000_metadata.csv"
MODEL_CHOICE = "efficientnet_b3"

# Training parameters
BATCH_SIZE = 32
NUM_EPOCHS = 30
LEARNING_RATE = 1e-4
WEIGHT_DECAY = 1e-4
# --- End Configuration ---
</VSCode.Cell>
<VSCode.Cell language="markdown">
## 2. Load and Preprocess Metadata
Load the CSV file containing patient metadata. We will perform several key preprocessing steps:
- **Handle Missing Values**: The `age` column has missing values. We'll fill these with the mean age.
- **Encode Categorical Features**: Convert `sex` and `localization` into numerical format using one-hot encoding.
- **Scale Numerical Features**: Standardize the `age` column so it has a mean of 0 and a standard deviation of 1.
- **Map Labels**: Create a mapping from the lesion type string (e.g., 'mel') to an integer index.
</VSCode.Cell>
<VSCode.Cell language="python">
# Load metadata
df = pd.read_csv(METADATA_PATH)

# --- Preprocessing Steps ---

# 1. Handle missing age values
df['age'].fillna(df['age'].mean(), inplace=True)

# 2. Encode categorical features
# One-hot encode 'sex' and 'localization'
categorical_features = ['sex', 'localization']
one_hot_encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
encoded_categoricals = one_hot_encoder.fit_transform(df[categorical_features])

# 3. Scale numerical features
numerical_features = ['age']
scaler = StandardScaler()
scaled_numericals = scaler.fit_transform(df[numerical_features])

# 4. Combine processed features
processed_metadata = np.hstack((scaled_numericals, encoded_categoricals))
metadata_feature_count = processed_metadata.shape[1]

print(f"Metadata successfully processed.")
print(f"Number of metadata features after encoding: {metadata_feature_count}")

# 5. Create label mapping
lesion_type_dict = {
    'akiec': 'Actinic Keratoses',
    'bcc': 'Basal Cell Carcinoma',
    'bkl': 'Benign Keratosis',
    'df': 'Dermatofibroma',
    'mel': 'Melanoma',
    'nv': 'Melanocytic Nevi',
    'vasc': 'Vascular Skin Lesions'
}
df['lesion_type'] = df['dx'].map(lesion_type_dict)
df['label'] = pd.Categorical(df['dx']).codes

# Store the mapping for later
class_to_idx = {cls: i for i, cls in enumerate(pd.Categorical(df['dx']).categories)}
idx_to_class = {i: cls for cls, i in class_to_idx.items()}
num_classes = len(class_to_idx)

print(f"\nClass to index mapping:\n{class_to_idx}")

# Add processed metadata and image paths to the dataframe
df['image_path'] = df['image_id'].apply(lambda x: os.path.join(IMG_DIR, x + '.jpg'))
# This is a bit of a trick to store the numpy array in the dataframe
df['processed_metadata'] = [processed_metadata[i] for i in range(len(processed_metadata))]

df.head()
