Skip to content

Commit

Permalink
Merge pull request #171 from mzenk/task2-code
Browse files Browse the repository at this point in the history
Add MLCubes for evaluation
  • Loading branch information
ujjwalbaid0408 committed Jun 17, 2022
2 parents de579b0 + f828987 commit 0e1fef8
Show file tree
Hide file tree
Showing 30 changed files with 1,782 additions and 52 deletions.
57 changes: 52 additions & 5 deletions Task_2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,59 @@ _Copyright © German Cancer Research Center (DKFZ), Division of Medical Image Co

This tasks focuses on segmentation methods that can learn from multi-institutional datasets how to be robust to cross-institution distribution shifts at test-time, effectively solving a domain generalization problem. In this repository, you can find information on the container submission and ranking for task 2 of the FeTS challenge 2021. We provide:

- [MLCube (docker) template](https://github.com/mlcommons/mlcube_examples/tree/master/fets/model) (coming soon): This is a guide how to build a container submission. For more details on how to submit to task 2 of the FeTS challenge 2022, see the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
- A [script](scripts/generate_toy_test_cases.py) to extract "toy test cases" from the official training data. These can be used for testing the reproducibility of your segmentation performance in functionality tests prior to the final submission. More details on the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
- [MLCube (docker) template](mlcubes/model): This is a guide how to build a container submission. For more details on how to submit to task 2 of the FeTS challenge 2022, see the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
- [MLCubes fore evaluation pipeline](mlcubes): These are used for running the evaluation pipeline. Participants should not modify them, as they are just provided for transparency of the official evaluation.
- Code that is used to compute the final [ranking](ranking)

## Requirements
The requirements of these components are described in the readme files of the respective folders. Below, you find information on how to prepare a submission and run our sanity check on them. Please also note the [hardware constraints](#hardware-constraints-for-submissions) submissions have to obey.

In order to run the `generate_toy_test_cases.py` script, you need the official [challenge training data](https://www.synapse.org/#!Synapse:syn28546456/wiki/617246). Also, Python 3.6 or higher is required.
## How to prepare your submission container

The ranking code requirements are described [here](ranking).
You need to modify the MLCube template we provide. Details are described [here](mlcubes/model).

## How to run the evaluation pipeline locally

Once you have prepared your submission and pushed it to [synapse](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255), it's possible to run the official evaluation pipeline on toy test cases for sanity-checking your submission. To do so, please follow these steps:

1. [Download](https://hub.dkfz.de/s/Ctb6bQ7mbiwM6Af) the medperf environment folder and unpack it:
```bash
cd ~
mkdir .medperf
cd .medperf
tar -xzvf ~/Downloads/medperf_env.tar.gz
```
2. Setup python environment (install MedPerf):
```bash
# Optional but recommended: use conda or virtualenv
conda create -n fets_medperf pip
conda activate fets_medperf
# Actual installation. Important: Please use the branch below
cd ~
git clone https://github.com/mlcommons/medperf.git && \
cd medperf/cli && \
git checkout cli-assoc-comp-test && \
pip install -e .
```
3. Run the sanity check with docker:
```
medperf --log=debug --no-cleanup test -b 1
```
Above will run the default model defined in this [folder](mlcubes/model/mlcube/). To use your local model, please specify its path with -m:
```
MODEL_PATH=/path/to/local/mlcube/folder
medperf --log=debug --no-cleanup test -b 1 -m $MODEL_PATH
```
Note that the folder passed with `-m` needs to contain an `mlcube.yaml`, which is used to pull the docker image and set runtime arguments.

The results and logs from your local test run are located in the `~/.medperf/results` and `~/.medperf/logs` folder, respectively. They can be compared to the test run executed on the organizers' infrastructure to guarantee reproducibility. Making a submission on [synapse](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255) will trigger a test run through the organizers. Note that we will convert the docker images to singularity on our end. If you would like to run with singularity as well, please ask a question in the [forum](https://www.synapse.org/#!Synapse:syn28546456/discussion/default).

Note that the toy test cases are part of the FeTS 2022 training data and the same [data usage agreements](https://www.synapse.org/#!Synapse:syn28546456/wiki/617246) apply.

## Hardware Constraints for Submissions

In the testing phase of Task 2, we are going to perform a federated evaluation on multiple remote institutions with limited computation capabilities. To finish the evaluation before the MICCAI conference, we have to restrict the inference time of the submitted algorithms. As the number of participants is not known in advance, we decided for the following rules in that regard:

- We will perform a test run of the submission on three toy test cases (shipped with the MedPerf environment) on a system with one GPU (11GB) and 40 GB RAM.
- For each submission, we are going to check if the algorithms produces valid outputs on the toy test cases. Submissions that exit with error are invalid.
- Participants are allowed to do their own memory management to fit a larger algorithm, but there will be a timeout of `num_cases * 180` seconds on the inference time.
<!-- - After conversion to a singularity image file, each submission has to be smaller than 12GB. Participants will be notified if this limit is exceeded during the test run. -->
47 changes: 0 additions & 47 deletions Task_2/generate_toy_test_cases.py

This file was deleted.

54 changes: 54 additions & 0 deletions Task_2/mlcubes/data_prep/mlcube/mlcube.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: FeTS challenge 2022 (task 2) Medperf Data Preparator Cube
description: MLCube for building data preparators for MedPerf
authors:
- {name: "MLCommons Medical Working Group"}
- {name: "Maximilian Zenk (DKFZ)"}

platform:
accelerator_count: 0

docker:
# Image name.
image: docker.synapse.org/syn31437293/fets22_data-prep
# Docker build context relative to $MLCUBE_ROOT. Default is `build`.
build_context: "../project"
# Docker file name within docker build context, default is `Dockerfile`.
build_file: "Dockerfile"

tasks:
prepare:
# This task is in charge of transforming the input data into the format expected by the model cubes.
parameters:
inputs: {
data_path: {type: directory, default: data}, # Value must point to a directory containing the raw data inside workspace
labels_path: {type: directory, default: data}, # Not used in this example
parameters_file: parameters.yaml # Not used in this example
}
outputs: {
output_path: prepped_data/, # Indicates where to store the transformed data. Must contain prepared data
output_labels_path: labels/ # Indicates where to store the transformed data. Must contain labels
}
sanity_check:
# This task ensures that the previously transformed data was transformed correctly.
# It runs a set of tests that check que quality of the data. The rigurosity of those
# tests is determined by the cube author.
parameters:
inputs: {
data_path: {type: directory, default: prepped_data}, # Value should be the first output of the prepare task
labels_path: labels/, # Value should be the second output of the prepare task
parameters_file: parameters.yaml # Not used in this example
}
statistics:
# This task computes statistics on the prepared dataset. Its purpose is to get a high-level
# idea of what is contained inside the data, without providing any specifics of any single entry
parameters:
inputs: {
data_path: {type: directory, default: prepped_data}, # Value should be the first output of the prepare task
labels_path: labels/, # Value should be the second output of the prepare task
parameters_file: parameters.yaml # Not used in this example
}
outputs: {
output_path: {
type: file, default: statistics.yaml
}
}
Empty file.
30 changes: 30 additions & 0 deletions Task_2/mlcubes/data_prep/project/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM ubuntu:18.04

RUN apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common \
python3-dev \
curl && \
rm -rf /var/lib/apt/lists/*

RUN add-apt-repository ppa:deadsnakes/ppa -y && apt-get update

RUN apt-get install python3 -y

RUN apt-get install python3-pip -y

COPY ./requirements.txt project/requirements.txt

RUN pip3 install --upgrade pip

RUN pip3 install --no-cache-dir -r project/requirements.txt

# Set the locale
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8

COPY . /project

WORKDIR /project

ENTRYPOINT ["python3", "/project/mlcube.py"]
81 changes: 81 additions & 0 deletions Task_2/mlcubes/data_prep/project/mlcube.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# MLCube Entrypoint
#
# This script shows how you can bridge your app with an MLCube interface.
# MLCubes expect the entrypoint to behave like a CLI, where tasks are
# commands, and input/output parameters and command-line arguments.
# You can provide that interface to MLCube in any way you prefer.
# Here, we show a way that requires minimal intrusion to the original code,
# By running the application through subprocesses.

import typer
from prepare import run_preparation
from sanity_check import run_sanity_check
from statistics import run_statistics


app = typer.Typer()


@app.command("prepare")
def prepare(
data_path: str = typer.Option(..., "--data_path"),
labels_path: str = typer.Option(..., "--labels_path"),
params_file: str = typer.Option(..., "--parameters_file"),
out_path: str = typer.Option(..., "--output_path"),
out_labels_path: str = typer.Option(..., "--output_labels_path"),
):
"""Prepare task command. This is what gets executed when we run:
`mlcube run --task=prepare`
Args:
data_path (str): Location of the data to transform. Required for Medperf Data Preparation MLCubes.
labels_path (str): Location of the labels. Required for Medperf Data Preparation MLCubes
params_file (str): Location of the parameters.yaml file. Required for Medperf Data Preparation MLCubes.
out_path (str): Location to store transformed data. Required for Medperf Data Preparation MLCubes.
"""
run_preparation(
input_dir=data_path,
output_data_dir=out_path,
output_label_dir=out_labels_path
)


@app.command("sanity_check")
def sanity_check(
data_path: str = typer.Option(..., "--data_path"),
labels_path: str = typer.Option(..., "--labels_path"),
params_file: str = typer.Option(..., "--parameters_file"),
):
"""Sanity check task command. This is what gets executed when we run:
`mlcube run --task=sanity_check`
Args:
data_path (str): Location of the prepared data. Required for Medperf Data Preparation MLCubes.
params_file (str): Location of the parameters.yaml file. Required for Medperf Data Preparation MLCubes.
"""
run_sanity_check(data_path=data_path, labels_path=labels_path)


@app.command("statistics")
def statistics(
data_path: str = typer.Option(..., "--data_path"),
labels_path: str = typer.Option(..., "--labels_path"),
params_file: str = typer.Option(..., "--parameters_file"),
output_path: str = typer.Option(..., "--output_path"),
):
"""Computes statistics about the data. This statistics are uploaded
to the Medperf platform under the data owner's approval. Include
every statistic you consider useful for determining the nature of the
data, but keep in mind that we want to keep the data as private as
possible.
Args:
data_path (str): Location of the prepared data. Required for Medperf Data Preparation MLCubes.
params_file (str): Location of the parameters.yaml file. Required for Medperf Data Preparation MLCubes.
output_path (str): File to store the statistics. Must be statistics.yaml. Required for Medperf Data Preparation MLCubes.
"""
run_statistics(data_path=data_path, labels_path=labels_path, out_file=output_path)


if __name__ == "__main__":
app()
50 changes: 50 additions & 0 deletions Task_2/mlcubes/data_prep/project/prepare.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from pathlib import Path
import shutil
from tqdm import tqdm


def copy_subject(subject_dir: Path, output_dir_data: Path, output_dir_labels: Path):
subj_id = subject_dir.name
# it's possible that minor naming differences are present. Accepted options for each modality are below.
# input format:
# <subject_id>[_brain]_t1.nii.gz etc
# <subject_id>[_brain]_final_seg.nii.gz
# output format:
# <subject_id>_brain_t1.nii.gz etc
# <subject_id>_final_seg.nii.gz
files_to_copy = {
"t1": [f"{subj_id}_brain_t1.nii.gz", f"{subj_id}_t1.nii.gz"],
"t1ce": [f"{subj_id}_brain_t1ce.nii.gz", f"{subj_id}_t1ce.nii.gz"],
"t2": [f"{subj_id}_brain_t2.nii.gz", f"{subj_id}_t2.nii.gz"],
"flair": [f"{subj_id}_brain_flair.nii.gz", f"{subj_id}_flair.nii.gz"],
"seg": [
f"{subj_id}_final_seg.nii.gz",
f"{subj_id}_brain_final_seg.nii.gz",
f"{subj_id}_seg.nii.gz",
f"{subj_id}_brain_seg.nii.gz",
],
}
for k, fname_options in files_to_copy.items():
for filename in fname_options:
file_path = subject_dir / filename
output_dir = output_dir_data / subj_id
if k == "seg":
output_dir = output_dir_labels
output_dir.mkdir(exist_ok=True)
if file_path.exists():
shutil.copy2(file_path, output_dir / files_to_copy[k][0])
break


def run_preparation(
input_dir: str, output_data_dir: str, output_label_dir: str
) -> None:
output_data_path = Path(output_data_dir)
output_labels_path = Path(output_label_dir)
output_data_path.mkdir(parents=True, exist_ok=True)
output_labels_path.mkdir(parents=True, exist_ok=True)

subject_list = [x for x in Path(input_dir).iterdir() if x.is_dir()]
print(f"Preparing {len(subject_list)} subjects...")
for subject_dir in tqdm(subject_list):
copy_subject(subject_dir, output_data_path, output_labels_path)
6 changes: 6 additions & 0 deletions Task_2/mlcubes/data_prep/project/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pyYAML
typer
pandas
SimpleITK>=2.1.0
numpy
tqdm
Loading

0 comments on commit 0e1fef8

Please sign in to comment.