Merge pull request #171 from mzenk/task2-code

Add MLCubes for evaluation
FeTS-AI · Jun 17, 2022 · 0e1fef8 · 0e1fef8
2 parents de579b0 + f828987
commit 0e1fef8
Show file tree

Hide file tree

Showing 30 changed files with 1,782 additions and 52 deletions.
diff --git a/Task_2/README.md b/Task_2/README.md
@@ -7,12 +7,59 @@ _Copyright © German Cancer Research Center (DKFZ), Division of Medical Image Co
 
 This tasks focuses on segmentation methods that can learn from multi-institutional datasets how to be robust to cross-institution distribution shifts at test-time, effectively solving a domain generalization problem. In this repository, you can find information on the container submission and ranking for task 2 of the FeTS challenge 2021. We provide:
 
-- [MLCube (docker) template](https://github.com/mlcommons/mlcube_examples/tree/master/fets/model) (coming soon): This is a guide how to build a container submission. For more details on how to submit to task 2 of the FeTS challenge 2022, see the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
-- A [script](scripts/generate_toy_test_cases.py) to extract "toy test cases" from the official training data. These can be used for testing the reproducibility of your segmentation performance in functionality tests prior to the final submission. More details on the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
+- [MLCube (docker) template](mlcubes/model): This is a guide how to build a container submission. For more details on how to submit to task 2 of the FeTS challenge 2022, see the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
+- [MLCubes fore evaluation pipeline](mlcubes): These are used for running the evaluation pipeline. Participants should not modify them, as they are just provided for transparency of the official evaluation.
 - Code that is used to compute the final [ranking](ranking)
 
-## Requirements
+The requirements of these components are described in the readme files of the respective folders. Below, you find information on how to prepare a submission and run our sanity check on them. Please also note the [hardware constraints](#hardware-constraints-for-submissions) submissions have to obey.
 
-In order to run the `generate_toy_test_cases.py` script, you need the official [challenge training data](https://www.synapse.org/#!Synapse:syn28546456/wiki/617246). Also, Python 3.6 or higher is required.
+## How to prepare your submission container
 
-The ranking code requirements are described [here](ranking).
+You need to modify the MLCube template we provide. Details are described [here](mlcubes/model).
+
+## How to run the evaluation pipeline locally
+
+Once you have prepared your submission and pushed it to [synapse](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255), it's possible to run the official evaluation pipeline on toy test cases for sanity-checking your submission. To do so, please follow these steps:
+
+1. [Download](https://hub.dkfz.de/s/Ctb6bQ7mbiwM6Af) the medperf environment folder and unpack it:
+    ```bash
+    cd ~
+    mkdir .medperf
+    cd .medperf
+    tar -xzvf ~/Downloads/medperf_env.tar.gz
+    ```
+2. Setup python environment (install MedPerf):
+    ```bash
+    # Optional but recommended: use conda or virtualenv
+    conda create -n fets_medperf pip
+    conda activate fets_medperf
+    # Actual installation. Important: Please use the branch below
+    cd ~
+    git clone https://github.com/mlcommons/medperf.git && \
+        cd medperf/cli && \
+        git checkout cli-assoc-comp-test && \
+        pip install -e .
+    ```
+3. Run the sanity check with docker:
+    ```
+    medperf --log=debug --no-cleanup test -b 1
+    ```
+    Above will run the default model defined in this [folder](mlcubes/model/mlcube/). To use your local model, please specify its path with -m:
+    ```
+    MODEL_PATH=/path/to/local/mlcube/folder
+    medperf --log=debug --no-cleanup test -b 1 -m $MODEL_PATH
+    ```
+    Note that the folder passed with `-m` needs to contain an `mlcube.yaml`, which is used to pull the docker image and set runtime arguments.
+
+The results and logs from your local test run are located in the `~/.medperf/results` and `~/.medperf/logs` folder, respectively. They can be compared to the test run executed on the organizers' infrastructure to guarantee reproducibility. Making a submission on [synapse](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255) will trigger a test run through the organizers. Note that we will convert the docker images to singularity on our end. If you would like to run with singularity as well, please ask a question in the [forum](https://www.synapse.org/#!Synapse:syn28546456/discussion/default).
+
+Note that the toy test cases are part of the FeTS 2022 training data and the same [data usage agreements](https://www.synapse.org/#!Synapse:syn28546456/wiki/617246) apply.
+
+## Hardware Constraints for Submissions
+
+In the testing phase of Task 2, we are going to perform a federated evaluation on multiple remote institutions with limited computation capabilities. To finish the evaluation before the MICCAI conference, we have to restrict the inference time of the submitted algorithms. As the number of participants is not known in advance, we decided for the following rules in that regard:
+
+- We will perform a test run of the submission on three toy test cases (shipped with the MedPerf environment) on a system with one GPU (11GB) and 40 GB RAM.
+- For each submission, we are going to check if the algorithms produces valid outputs on the toy test cases. Submissions that exit with error are invalid.
+- Participants are allowed to do their own memory management to fit a larger algorithm, but there will be a timeout of `num_cases * 180` seconds on the inference time.
+<!-- - After conversion to a singularity image file, each submission has to be smaller than 12GB. Participants will be notified if this limit is exceeded during the test run. -->
diff --git a/Task_2/generate_toy_test_cases.py b/Task_2/generate_toy_test_cases.py
diff --git a/Task_2/mlcubes/data_prep/mlcube/mlcube.yaml b/Task_2/mlcubes/data_prep/mlcube/mlcube.yaml
@@ -0,0 +1,54 @@
+name: FeTS challenge 2022 (task 2) Medperf Data Preparator Cube
+description: MLCube for building data preparators for MedPerf
+authors:
+ - {name: "MLCommons Medical Working Group"}
+ - {name: "Maximilian Zenk (DKFZ)"}
+
+platform:
+  accelerator_count: 0
+
+docker:
+  # Image name.
+  image: docker.synapse.org/syn31437293/fets22_data-prep
+  # Docker build context relative to $MLCUBE_ROOT. Default is `build`.
+  build_context: "../project"
+  # Docker file name within docker build context, default is `Dockerfile`.
+  build_file: "Dockerfile"
+
+tasks:
+  prepare:
+  # This task is in charge of transforming the input data into the format expected by the model cubes. 
+    parameters:
+      inputs: {
+        data_path: {type: directory, default: data},  # Value must point to a directory containing the raw data inside workspace
+        labels_path: {type: directory, default: data},  # Not used in this example
+        parameters_file: parameters.yaml  # Not used in this example
+      }
+      outputs: {
+        output_path: prepped_data/,  # Indicates where to store the transformed data. Must contain prepared data
+        output_labels_path: labels/  # Indicates where to store the transformed data. Must contain labels
+      }
+  sanity_check:
+  # This task ensures that the previously transformed data was transformed correctly.
+  # It runs a set of tests that check que quality of the data. The rigurosity of those
+  # tests is determined by the cube author.
+    parameters:
+      inputs: {
+        data_path: {type: directory, default: prepped_data},  # Value should be the first output of the prepare task
+        labels_path: labels/,  # Value should be the second output of the prepare task
+        parameters_file: parameters.yaml  # Not used in this example
+      }
+  statistics:
+  # This task computes statistics on the prepared dataset. Its purpose is to get a high-level
+  # idea of what is contained inside the data, without providing any specifics of any single entry
+    parameters:
+      inputs: {
+        data_path: {type: directory, default: prepped_data},  # Value should be the first output of the prepare task
+        labels_path: labels/,  # Value should be the second output of the prepare task
+        parameters_file: parameters.yaml  # Not used in this example
+      }
+      outputs: {
+        output_path: {
+          type: file, default: statistics.yaml
+        }
+      }
diff --git a/Task_2/mlcubes/data_prep/mlcube/workspace/parameters.yaml b/Task_2/mlcubes/data_prep/mlcube/workspace/parameters.yaml
diff --git a/Task_2/mlcubes/data_prep/project/Dockerfile b/Task_2/mlcubes/data_prep/project/Dockerfile
@@ -0,0 +1,30 @@
+FROM ubuntu:18.04
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    software-properties-common \
+    python3-dev \
+    curl && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN add-apt-repository ppa:deadsnakes/ppa -y && apt-get update
+
+RUN apt-get install python3 -y
+
+RUN apt-get install python3-pip -y
+
+COPY ./requirements.txt project/requirements.txt 
+
+RUN pip3 install --upgrade pip
+
+RUN pip3 install --no-cache-dir -r project/requirements.txt
+
+# Set the locale
+ENV LANG C.UTF-8
+ENV LC_ALL C.UTF-8
+
+COPY . /project
+
+WORKDIR /project
+
+ENTRYPOINT ["python3", "/project/mlcube.py"]
diff --git a/Task_2/mlcubes/data_prep/project/mlcube.py b/Task_2/mlcubes/data_prep/project/mlcube.py
@@ -0,0 +1,81 @@
+# MLCube Entrypoint
+#
+# This script shows how you can bridge your app with an MLCube interface.
+# MLCubes expect the entrypoint to behave like a CLI, where tasks are
+# commands, and input/output parameters and command-line arguments.
+# You can provide that interface to MLCube in any way you prefer.
+# Here, we show a way that requires minimal intrusion to the original code,
+# By running the application through subprocesses.
+
+import typer
+from prepare import run_preparation
+from sanity_check import run_sanity_check
+from statistics import run_statistics
+
+
+app = typer.Typer()
+
+
+@app.command("prepare")
+def prepare(
+    data_path: str = typer.Option(..., "--data_path"),
+    labels_path: str = typer.Option(..., "--labels_path"),
+    params_file: str = typer.Option(..., "--parameters_file"),
+    out_path: str = typer.Option(..., "--output_path"),
+    out_labels_path: str = typer.Option(..., "--output_labels_path"),
+):
+    """Prepare task command. This is what gets executed when we run:
+    `mlcube run --task=prepare`
+
+    Args:
+        data_path (str): Location of the data to transform. Required for Medperf Data Preparation MLCubes.
+        labels_path (str): Location of the labels. Required  for Medperf Data Preparation MLCubes
+        params_file (str): Location of the parameters.yaml file. Required for Medperf Data Preparation MLCubes.
+        out_path (str): Location to store transformed data. Required for Medperf Data Preparation MLCubes.
+    """
+    run_preparation(
+        input_dir=data_path,
+        output_data_dir=out_path,
+        output_label_dir=out_labels_path
+    )
+
+
+@app.command("sanity_check")
+def sanity_check(
+    data_path: str = typer.Option(..., "--data_path"),
+    labels_path: str = typer.Option(..., "--labels_path"),
+    params_file: str = typer.Option(..., "--parameters_file"),
+):
+    """Sanity check task command. This is what gets executed when we run:
+    `mlcube run --task=sanity_check`
+
+    Args:
+        data_path (str): Location of the prepared data. Required for Medperf Data Preparation MLCubes.
+        params_file (str): Location of the parameters.yaml file. Required for Medperf Data Preparation MLCubes.
+    """
+    run_sanity_check(data_path=data_path, labels_path=labels_path)
+
+
+@app.command("statistics")
+def statistics(
+    data_path: str = typer.Option(..., "--data_path"),
+    labels_path: str = typer.Option(..., "--labels_path"),
+    params_file: str = typer.Option(..., "--parameters_file"),
+    output_path: str = typer.Option(..., "--output_path"),
+):
+    """Computes statistics about the data. This statistics are uploaded
+    to the Medperf platform under the data owner's approval. Include
+    every statistic you consider useful for determining the nature of the
+    data, but keep in mind that we want to keep the data as private as 
+    possible.
+
+    Args:
+        data_path (str): Location of the prepared data. Required for Medperf Data Preparation MLCubes.
+        params_file (str): Location of the parameters.yaml file. Required for Medperf Data Preparation MLCubes.
+        output_path (str): File to store the statistics. Must be statistics.yaml. Required for Medperf Data Preparation MLCubes. 
+    """
+    run_statistics(data_path=data_path, labels_path=labels_path, out_file=output_path)
+
+
+if __name__ == "__main__":
+    app()
diff --git a/Task_2/mlcubes/data_prep/project/prepare.py b/Task_2/mlcubes/data_prep/project/prepare.py
@@ -0,0 +1,50 @@
+from pathlib import Path
+import shutil
+from tqdm import tqdm
+
+
+def copy_subject(subject_dir: Path, output_dir_data: Path, output_dir_labels: Path):
+    subj_id = subject_dir.name
+    # it's possible that minor naming differences are present. Accepted options for each modality are below.
+    # input format:
+    # <subject_id>[_brain]_t1.nii.gz etc
+    # <subject_id>[_brain]_final_seg.nii.gz
+    # output format:
+    # <subject_id>_brain_t1.nii.gz etc
+    # <subject_id>_final_seg.nii.gz
+    files_to_copy = {
+        "t1": [f"{subj_id}_brain_t1.nii.gz", f"{subj_id}_t1.nii.gz"],
+        "t1ce": [f"{subj_id}_brain_t1ce.nii.gz", f"{subj_id}_t1ce.nii.gz"],
+        "t2": [f"{subj_id}_brain_t2.nii.gz", f"{subj_id}_t2.nii.gz"],
+        "flair": [f"{subj_id}_brain_flair.nii.gz", f"{subj_id}_flair.nii.gz"],
+        "seg": [
+            f"{subj_id}_final_seg.nii.gz",
+            f"{subj_id}_brain_final_seg.nii.gz",
+            f"{subj_id}_seg.nii.gz",
+            f"{subj_id}_brain_seg.nii.gz",
+        ],
+    }
+    for k, fname_options in files_to_copy.items():
+        for filename in fname_options:
+            file_path = subject_dir / filename
+            output_dir = output_dir_data / subj_id
+            if k == "seg":
+                output_dir = output_dir_labels
+            output_dir.mkdir(exist_ok=True)
+            if file_path.exists():
+                shutil.copy2(file_path, output_dir / files_to_copy[k][0])
+                break
+
+
+def run_preparation(
+    input_dir: str, output_data_dir: str, output_label_dir: str
+) -> None:
+    output_data_path = Path(output_data_dir)
+    output_labels_path = Path(output_label_dir)
+    output_data_path.mkdir(parents=True, exist_ok=True)
+    output_labels_path.mkdir(parents=True, exist_ok=True)
+
+    subject_list = [x for x in Path(input_dir).iterdir() if x.is_dir()]
+    print(f"Preparing {len(subject_list)} subjects...")
+    for subject_dir in tqdm(subject_list):
+        copy_subject(subject_dir, output_data_path, output_labels_path)
diff --git a/Task_2/mlcubes/data_prep/project/requirements.txt b/Task_2/mlcubes/data_prep/project/requirements.txt
@@ -0,0 +1,6 @@
+pyYAML
+typer
+pandas
+SimpleITK>=2.1.0
+numpy
+tqdm