In [None]:
%load_ext autoreload
%autoreload 2

# MIDOG 2025 Track 2: Classification of Atypical Mitotic Figures

This notebook will you show you how to get started in track 2 of this years challenge. For a general overview of track 2 visit our [Track 2 Overview Page](https://midog2025.deepmicroscopy.org/track-2-atypical-classification/). The task of track 2 is to develop a classification algorithm that can differentiate well between normal mitotic figures and atypical mitotic figures. A task that is very challenging due to high class imbalance, high intra-class variability and subtle differences between the two classes. 

This notebook will guide you through the following steps:
1. How to download the MIDOG 2025 Atypical Dataset
2. How to set up a simple classification pipeline

**Note: This is notebook should just give you an idea of how to approach the challenge. You can be creative and set the dataset up differently. You are also encouraged to use different models.**

# Prerequisites

Make sure that you set up your environment correctly by following the instructions of the `README.md` or from notebook `MIDOG2025_01_Exploratory_Data_Analysis.ipynb`.


In [None]:
import matplotlib.pyplot as plt 
import numpy as np
import os
import pandas as pd 
import plotly.express as px 
import torch
import torch.nn as nn
import torch.optim as optim

from PIL import Image
from pathlib import Path
from sklearn.metrics import balanced_accuracy_score
from sklearn.model_selection import KFold, train_test_split
from torch.utils.data import DataLoader, WeightedRandomSampler, Dataset
from tqdm.notebook import tqdm 

from utils.classification_utils import MitosisClassifier, ClassificationDataset, MitosisTrainer

# 1. Download the MIDOG 2025 Atypical Dataset

Please download the zip folder of images and the csv file of labels from Google Drive: 
https://drive.google.com/drive/folders/1MMRWZdcyEsCaCwu8-6MxlIrQLNMequpq?usp=drive_link 

Unzip the folder and the csv file into the directory of this notebook. If you set up your data differently you will have to adjust your paths in this notebook accordingly.

In [None]:
# Set the path to your images and the dataset file
image_dir = Path('MIDOG25_Binary_Classification_Train_Set')
dataset_file = 'MIDOG25_Atypical_Classification_Train_Set.csv'

# Let us load the dataset 
dataset = pd.read_csv(dataset_file)
dataset.head()

## Exploratory Analysis

The next examples will show you some patches of the atypical classification dataset. The patches in the dataset were extracted around the original annotations to simplify the classfication task. The next plot shows some normal mitotic figures. Pay attention to the high intra-class variability of the examples.

In [None]:
grouped_per_label = dataset.groupby('majority').size().reset_index(name='count')
fig = px.pie(
    grouped_per_label, 
    values='count', 
    names='majority', 
    title='Normal Mitotic Figures (NMF) vs Atypical Mitotic Figures (AMF)', 
    color_discrete_sequence=['#2ecc71', '#e74c3c'])
fig.show()

In [None]:
grouped_per_tumor = dataset.groupby(['Tumor', 'majority']).size().reset_index(name='count')
fig = px.bar(grouped_per_tumor, 
             x='Tumor',
             y='count',
             color='majority',
             barmode='group',
             title='Distribution of Atypical and Normal Mitotic Figures per Tumor Type',
             labels={'count': 'Count', 'Tumor': 'Tumor Type', 'majority': ''},
             color_discrete_sequence=['#e74c3c', '#2ecc71'])
fig.update_layout(
    plot_bgcolor='white',
    bargap=0.2,
    bargroupgap=0.1
)
fig.show()

In [None]:
grouped_per_tumor = dataset.groupby(['Scanner', 'majority']).size().reset_index(name='count')
fig = px.bar(grouped_per_tumor, 
             x='Scanner',
             y='count',
             color='majority',
             barmode='group',
             title='Distribution of Atypical and Normal Mitotic Figures per Scanner',
             labels={'count': 'Count', 'Tumor': 'Tumor Type', 'majority': ''},
             color_discrete_sequence=['#e74c3c', '#2ecc71'])
fig.update_layout(
    plot_bgcolor='white',
    bargap=0.2,
    bargroupgap=0.1
)
fig.show()

In [None]:
grouped_per_tumor = dataset.groupby(['Species', 'majority']).size().reset_index(name='count')
fig = px.bar(grouped_per_tumor, 
             x='Species',
             y='count',
             color='majority',
             barmode='group',
             title='Distribution of Atypical and Normal Mitotic Figures per Species',
             labels={'count': 'Count', 'Tumor': 'Tumor Type', 'majority': ''},
             color_discrete_sequence=['#e74c3c', '#2ecc71'])
fig.update_layout(
    plot_bgcolor='white',
    bargap=0.2,
    bargroupgap=0.1
)
fig.show()

## Visual Examples

The following examples show the extracted patches of the atypical classifcation dataset. The patches in this dataset are extracted around the original annotations to simplify the classification task. The first plot shows some normal mitotic figures. Pay attention to the high intra-class variability of mitotic figures. 

In [None]:
num_samples = 10

for tumortype in dataset['Tumor'].unique():
    patches = []
    tumor_dataset = dataset.query('Tumor == @tumortype and majority == "NMF"')
    samples = tumor_dataset.sample(n=num_samples)

    for idx, sample in samples.iterrows():
        file_path = image_dir / sample['image_id']
        patch = Image.open(file_path)
        patches.append(patch)

    fig = px.imshow(np.array(patches), facet_col=0, facet_col_wrap=5, labels={'facet_col':'NMF'}, title=tumortype)
    fig.show()

The next examples show atpyical mitotic figures. Pay close attention to the high intra-class variability. But also note that there can be only subtle differences to normal mitotic figures, which make this a really hard problem. 

In [None]:
num_samples = 10

for tumortype in dataset['Tumor'].unique():
    patches = []
    tumor_dataset = dataset.query('Tumor == @tumortype and majority == "AMF"')
    samples = tumor_dataset.sample(n=num_samples)

    for idx, sample in samples.iterrows():
        file_path = image_dir / sample['image_id']
        patch = Image.open(file_path)
        patches.append(patch)

    fig = px.imshow(np.array(patches), facet_col=0, facet_col_wrap=5, labels={'facet_col':'AMF'}, title=tumortype)
    fig.show()

# 2. Simple Classification Pipeline

Next, we will set up a very simple classification pipeline. This only to give you an idea of how to approach the challenge. You are encouraged to set up the data differently and use other models to get better results. Have a look at the [MIDOG 2022 Overview Paper]() to get an idea of what techniques were successfull in achieving high domain robustness. Check out the latest literature get some ideas and apply them to this task. 

In [None]:
import os 

# Set up data 
def load_data_from_csv(csv_path, images_folder, label_col='majority'):
    """
    Reads a CSV file that contains image filenames and a 'majority' column 
    indicating the label ('AMF' or 'NMF'). 
    Returns:
        images (list of str): Full paths to images
        labels (list of int): Numeric labels (0 for AMF -> Atypical, 1 for NMF -> Normal)
    """
    df = pd.read_csv(csv_path)
    
    # Map string labels to numeric
    label_map = {
        'AMF': 0,  # Atypical
        'NMF': 1   # Normal
    }

    images = []
    labels = []

    for _, row in df.iterrows():
        img_name = row['image_id']  # Adjust if your CSV column name differs
        label_str = row[label_col]
        img_path = os.path.join(images_folder, img_name)
        if not os.path.isfile(img_path):
            continue
        
        images.append(img_path)
        labels.append(label_map[label_str])
    
    return images, labels

In [None]:
# Load data 
images, labels = load_data_from_csv(dataset_file, image_dir, label_col='majority')

# Split data into training and test split 
train_images, test_images, train_labels, test_labels = train_test_split(images, labels, test_size=0.2, random_state=42)

len(train_images), len(test_images)

In [None]:
# Set up training configurations
num_epochs = 10
batch_size = 128
num_folds = 5
lr=1e-4
model_name = 'efficientnet_v2_s'
weights = 'IMAGENET1K_V1'
experiment_dir = 'classification_results'


# Set up the trainer
trainer = MitosisTrainer(
    model_name=model_name,
    weights=weights,
    num_epochs=num_epochs,
    batch_size=batch_size,
    num_folds=num_folds,
    lr=lr,
    experiment_dir=experiment_dir
)

# Run the k-fold cross validation and evaluate on the test set
val_accuracies, test_accuracies = trainer.train_and_evaluate(
    train_images=train_images,
    train_labels=train_labels, 
    test_images=test_images,
    test_labels=test_labels
)

In [None]:
val_accuracies

In [None]:
test_accuracies