Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ You will need to **request access** to the following HuggingFace repositories, w
- [KatherLab/COBRA](https://huggingface.co/KatherLab/COBRA) (for [`cobra_extract_features`](tasks/cobra_extract_features/) and [`cobra_heatmaps`](tasks/cobra_heatmaps))
- [pixas/MedSSS_Policy](https://huggingface.co/pixas/MedSSS_Policy) (for [`medsss_generate`](tasks/medsss_generate/))
- [YukunZhou/RETFound_mae_natureCFP](https://huggingface.co/YukunZhou/RETFound_mae_natureCFP) (for [`retfound_feature_vector`](tasks/retfound_feature_vector/))
- [KatherLab/MoPaDi](https://huggingface.co/KatherLab/MoPaDi) (for [`mopadi_generate_counterfactuals`](tasks/mopadi_generate_counterfactuals/))

An OpenAI API key is required for the [`textgrad_medical_qa_optimize`](tasks/textgrad_medical_qa_optimize) task.

Expand Down
Empty file.
1 change: 1 addition & 0 deletions tasks/mopadi_generate_counterfactuals/data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Add large files here that should not be committed to the repository
Binary file not shown.
1,098 changes: 1,098 additions & 0 deletions tasks/mopadi_generate_counterfactuals/data/TCGA-BRCA-DX_CLINI.csv

Large diffs are not rendered by default.

Binary file not shown.
8 changes: 8 additions & 0 deletions tasks/mopadi_generate_counterfactuals/data/download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#! /bin/bash

set -e

# Download large files required for the task

# Example:
# wget -O data/large_file_1.zip https://example.com/large_file_1.zip
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 61 additions & 0 deletions tasks/mopadi_generate_counterfactuals/implementation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
def mopadi_generate_counterfactuals(images_dir: str = '/mount/input/images/TCGA-CRC', feat_path_test: str = '/mount/input/features/TCGA-CRC', clini_table: str =
'/mount/input/TCGA-CRC-DX_CLINI.xlsx', target_label: str = 'isMSIH', base_dir: str = '/mount/output/counterfactuals_crc_msi', manipulation_levels: list = [0.02, 0.04, 0.06,
0.08], pretrained_autoenc_name: str = 'crc_512_model', pretrained_clf_name: str = 'msi') -> dict:
"""
Generate counterfactual explanations for 3 tiles per patient by manipulating them at 4 different amplitudes. Use a pretrained diffusion autoencoder according to the
cancer type, combined with a corresponding trained MIL classifier. You will be provided with the path to the folder containing images, the clinical table with each
patient’s target label values, and the folder containing pre-extracted features

Args:
images_dir: Path to the folder containing patient subfolders with image patches
feat_path_test: Path to the folder containing extracted features for each patient
clini_table: Path to the XLSX file containing the MSI status of each patient
target_label: Name of the column in the clinical table that contains classification labels
base_dir: Path to the output directory where the results will be saved
manipulation_levels: Amplitude of the manipulation to be applied to the images
pretrained_autoenc_name: Name of the pretrained diffusion autoencoder model
pretrained_clf_name: Name of the pretrained classifier

Returns:
dict with the following structure:
{
'num_counterfactuals': int # The number of counterfactual images that were generated
}
"""
import os
import yaml
from pathlib import Path

Path(base_dir).mkdir(parents=True, exist_ok=True)
my_config_path = Path(f"config_{target_label}.yaml").resolve()

config = yaml.safe_load(Path("/workspace/mopadi/conf.yaml").open("r"))
config["base_dir"] = base_dir
config["gpus"] = [0]
config["mil_classifier"]["images_dir"] = images_dir
config["mil_classifier"]["feat_path_test"] = feat_path_test
config["mil_classifier"]["clini_table"] = clini_table
config["mil_classifier"]["target_label"] = target_label
config["mil_classifier"]["manipulation_levels"] = list(manipulation_levels)
config["mil_classifier"]["nr_top_tiles"] = 3
config["mil_classifier"]["use_pretrained"] = True
config["mil_classifier"]["pretrained_autoenc_name"] = pretrained_autoenc_name
config["mil_classifier"]["pretrained_clf_name"] = pretrained_clf_name

yaml.dump(config, my_config_path.open("w"))

command = f"mopadi/.venv/bin/python mopadi/src/mopadi/run_mopadi.py mil --config {my_config_path} --mode manipulate"
return_code = os.system(command)

if return_code != 0:
raise RuntimeError(f"'mopadi/.venv/bin/python mopadi/src/mopadi/run_mopadi.py mil --mode manipulate' failed with code {return_code}.")

# Count the number of generated counterfactuals
num_counterfactuals = 0
final_output_dir = os.path.join(base_dir, f'mil_classifier_{target_label}', 'counterfactuals')
for patient in os.listdir(final_output_dir):
patient_dir = os.path.join(final_output_dir, patient)
for tile_folder in os.listdir(patient_dir):
num_counterfactuals += sum('manip_to' in fname for fname in os.listdir(os.path.join(patient_dir, tile_folder)))

return {"num_counterfactuals": num_counterfactuals}
11 changes: 11 additions & 0 deletions tasks/mopadi_generate_counterfactuals/install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#! /bin/bash
set -e

git clone https://github.com/KatherLab/mopadi /workspace/mopadi
cd /workspace/mopadi && git checkout 4e76820

apt-get update && apt-get install -y libgl1 libglib2.0-0
pip install uv
pip install pyyaml
uv sync
source .venv/bin/activate
141 changes: 141 additions & 0 deletions tasks/mopadi_generate_counterfactuals/task.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
name: mopadi_generate_counterfactuals
repo:
name: mopadi
url: "https://github.com/KatherLab/mopadi"
commit: 4e76820
env:
HF_TOKEN: "${env:HF_TOKEN}"
papers: [zigutyte2024mopadi]
category: pathology
requires: cuda
description: Generate counterfactual explanations for the top 3 tiles per patient by manipulating them with a specific amplitude, such that the predicted class of each counterfactual image flips to the opposite class (i.e., the predicted output for the opposite class exceeds 0.9), while avoiding excessive overmanipulation. Use a pretrained diffusion autoencoder according to the cancer type, combined with a corresponding MIL classifier trained to distinguish biologically meaningful histological patterns. You will be provided with the path to the folder containing images, the clinical table with each patient’s target label values, and the folder containing pre-extracted features
arguments:
- name: images_dir
description: Path to the folder containing patient subfolders with image patches
type: str
- name: feat_path_test
description: Path to the folder containing extracted features for each patient
type: str
- name: clini_table
description: Path to the XLSX file containing the MSI status of each patient
type: str
- name: target_label
description: Name of the column in the clinical table that contains classification labels
type: str
- name: base_dir
description: Path to the output directory where the results will be saved
type: str
- name: manipulation_levels
description: Amplitude of the manipulation to be applied to the images
type: list
- name: pretrained_autoenc_name
description: Name of the pretrained diffusion autoencoder model
type: str
- name: pretrained_clf_name
description: Name of the pretrained classifier
type: str
returns:
- name: num_counterfactuals
description: The number of counterfactual images that were generated
type: int
example:
arguments:
- name: images_dir
value: /mount/input/images/TCGA-CRC
- name: feat_path_test
value: /mount/input/features/TCGA-CRC
- name: clini_table
value: /mount/input/TCGA-CRC-DX_CLINI.xlsx
- name: target_label
value: isMSIH
- name: base_dir
value: /mount/output/counterfactuals_crc_msi
- name: manipulation_levels
value: [0.06]
- name: pretrained_autoenc_name
value: crc_512_model
- name: pretrained_clf_name
value: msi
mount:
- source: images/TCGA-CRC
target: images/TCGA-CRC
- source: features/TCGA-CRC
target: features/TCGA-CRC
- source: TCGA-CRC-DX_CLINI.xlsx
target: TCGA-CRC-DX_CLINI.xlsx
test_invocations:
- name: brca_types
arguments:
- name: images_dir
value: /mount/input/images/TCGA-BRCA
- name: feat_path_test
value: /mount/input/features/TCGA-BRCA
- name: clini_table
value: /mount/input/TCGA-BRCA-DX_CLINI.csv
- name: target_label
value: BRCA_Pathology
- name: base_dir
value: /mount/output/counterfactuals_brca_types
- name: manipulation_levels
value: [0.06]
- name: pretrained_autoenc_name
value: brca_512_model
- name: pretrained_clf_name
value: type
mount:
- source: images/TCGA-BRCA
target: images/TCGA-BRCA
- source: features/TCGA-BRCA
target: features/TCGA-BRCA
- source: TCGA-BRCA-DX_CLINI.csv
target: TCGA-BRCA-DX_CLINI.csv
- name: liver
arguments:
- name: images_dir
value: /mount/input/images/Pancancer-liver
- name: feat_path_test
value: /mount/input/features/Pancancer
- name: clini_table
value: /mount/input/Pancancer_clini.xlsx
- name: target_label
value: Type
- name: base_dir
value: /mount/output/counterfactuals_liver_types
- name: manipulation_levels
value: [0.04]
- name: pretrained_autoenc_name
value: pancancer_model
- name: pretrained_clf_name
value: liver
mount:
- source: images/Pancancer-liver
target: images/Pancancer-liver
- source: features/Pancancer
target: features/Pancancer
- source: Pancancer_clini.xlsx
target: Pancancer_clini.xlsx
- name: lung
arguments:
- name: images_dir
value: /mount/input/images/Pancancer-lung
- name: feat_path_test
value: /mount/input/features/Pancancer
- name: clini_table
value: /mount/input/Pancancer_clini.xlsx
- name: target_label
value: Type
- name: base_dir
value: /mount/output/counterfactuals_lung_types
- name: manipulation_levels
value: [0.06]
- name: pretrained_autoenc_name
value: pancancer_model
- name: pretrained_clf_name
value: lung
mount:
- source: images/Pancancer-lung
target: images/Pancancer-lung
- source: features/Pancancer
target: features/Pancancer
- source: Pancancer_clini.xlsx
target: Pancancer_clini.xlsx
84 changes: 84 additions & 0 deletions tasks/mopadi_generate_counterfactuals/tests.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import pytest
from pytest_lazy_fixtures import lf
from tasks.utils import initialize, parametrize_invocation
from toolarena.run import ToolRunResult

initialize()


@parametrize_invocation("brca_types", "liver", "lung")
def test_status(invocation: ToolRunResult):
assert invocation.status == "success"


@pytest.mark.parametrize(
"invocation,expected_num_counterfactuals",
[
(lf("brca_types"), 3 * 1 * 1), # 3 top tiles × 4 manipulation levels x 1 patient
(lf("liver"), 3 * 1 * 1),
(lf("lung"), 3 * 1 * 2),
],
)
def test_num_counterfactuals(
invocation: ToolRunResult, expected_num_counterfactuals: int
):
assert invocation.result["num_counterfactuals"] == expected_num_counterfactuals


@parametrize_invocation("brca_types", "liver", "lung")
def test_if_predictions_flipped(invocation: ToolRunResult):
results_root_dir = invocation.output_dir.rglob("mil_classifier_*/counterfactuals/*")
patients_dirs = list(results_root_dir)
assert patients_dirs

for patient_dir in patients_dirs:
for tile_dir in patient_dir.iterdir():
if tile_dir.is_dir() and tile_dir.name.startswith("Tile_"):
pred_file = tile_dir / "predictions.txt"
assert pred_file.exists(), f"Missing predictions.txt in {tile_dir}"
assert pred_file.stat().st_size > 0, f"Empty predictions.txt in {tile_dir}"

with open(pred_file, "r") as f:
lines = f.read().splitlines()

original = None
flipped_detected = False
for line in lines:
if line.startswith("Original image prediction"):
original = eval(line.split(": ")[1])
original_idx = int(float(original[0]) < float(original[1]))

elif line.startswith("Pred rendered img"):
pred = eval(line.split(": ")[1])
pred_idx = int(float(pred[0]) < float(pred[1]))
pred_conf = float(pred[pred_idx])

if pred_idx != original_idx and pred_conf > 0.90:
flipped_detected = True
break

assert flipped_detected, f"No confident class flip detected in {pred_file}"


@parametrize_invocation("brca_types", "liver", "lung")
def test_manipulation_amplitude(invocation: ToolRunResult):
results_root_dir = invocation.output_dir.rglob("mil_classifier_*/counterfactuals/*")
patients_dirs = list(results_root_dir)

for patient_dir in patients_dirs:
for tile_dir in patient_dir.iterdir():
if tile_dir.is_dir():
pred_file = tile_dir / "predictions.txt"
with open(pred_file, "r") as f:
lines = f.read().splitlines()

current_amplitude = None
for line in lines:
if line.startswith("Manipulation amplitude:"):
try:
current_amplitude = float(line.split(":")[1].strip())
except ValueError:
raise AssertionError(f"Invalid amplitude value in line: {line} in {pred_file}")
elif line.startswith("Pred rendered img"):
assert current_amplitude is not None, f"Missing amplitude before prediction in {pred_file}"
assert current_amplitude <= 0.1, f"Manipulation amplitude too high: {current_amplitude} in {pred_file}"
30 changes: 28 additions & 2 deletions tasks/papers.bib
Original file line number Diff line number Diff line change
Expand Up @@ -212,5 +212,31 @@ @article{xiang2025musk
title = {A vision-language foundation model for precision oncology},
year = {2025},
journal = {Nature},
publisher = {Springer Science and Business Media LLC},
}
publisher = {Springer Science and Business Media LLC}
}

@article{pocock2022tiatoolbox,
title={TIAToolbox as an end-to-end library for advanced tissue image analytics},
author={Pocock, Johnathan and Graham, Simon and Vu, Quoc Dang and Jahanifar, Mostafa and Deshpande, Srijay and Hadjigeorghiou, Giorgos and Shephard, Adam and Bashir, Raja Muhammad Saad and Bilal, Mohsin and Lu, Wenqi and others},
journal={Communications medicine},
volume={2},
number={1},
pages={120},
year={2022},
publisher={Nature Publishing Group UK London}
}
@article{wiest2024llm,
title={LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models},
author={Wiest, Isabella Catharina and Wolf, Fabian and Le{\ss}mann, Marie-Elisabeth and van Treeck, Marko and Ferber, Dyke and Zhu, Jiefu and Boehme, Heiko and Bressem, Keno K and Ulrich, Hannes and Ebert, Matthias P and others},
journal={medRxiv},
year={2024}
}

@misc{zigutyte2024mopadi,
title={ounterfactual Diffusion Models for Mechanistic Explainability of Artificial Intelligence Models in Pathology},
author={Laura Žigutytė and Tim Lenz and Tianyu Han and Katherine Jane Hewitt and Nic Gabriel Reitsam and Sebastian Foersch and Zunamys I Carrero and Michaela Unger and Alexander T Pearson and Daniel Truhn and Jakob Nikolas Kather},
year={2024},
eprint={2024.10.29.620913},
archivePrefix={bioRxiv},
url={https://www.biorxiv.org/content/10.1101/2024.10.29.620913v1},
}