Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task 2 code update for 2022 #159

Merged
merged 6 commits into from
May 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 6 additions & 53 deletions Task_2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,61 +5,14 @@ _Copyright © German Cancer Research Center (DKFZ), Division of Medical Image Co

# Task 2: Generalization "in the wild"

This tasks focuses on how segmentation methods can learn from multi-institutional datasets how to be robust to distribution shifts at test-time, effectively solving a domain generalization problem. In this repository, you can find information on the container submission and ranking for task 2 of the FeTS challenge 2021. It is structured as follows:
This tasks focuses on segmentation methods that can learn from multi-institutional datasets how to be robust to cross-institution distribution shifts at test-time, effectively solving a domain generalization problem. In this repository, you can find information on the container submission and ranking for task 2 of the FeTS challenge 2021. We provide:

- [`singularity_example`](singularity_example): Guide how to build the container submission with examples
- [`scripts`](scripts): Scripts for running containers, both in the participant's environment and in the federated testing environment
- [`ranking`](ranking): Code for performing the final ranking

In the FeTS challenge task 2, participants can submit their solution in the form of a [singularity container](https://sylabs.io/guides/3.7/user-guide/index.html). Note that we do not impose restrictions on the participants how they train their model nor how they perform inference, as long as the resulting algorithm can be built into a singularity container with the simple interface described in `singularity_example`. Hence, after training a model, the following steps are required to submit it:

1. Write a container definition file and an inference script.
2. Build a singularity container for inference using above files and the final model weights.
3. Upload the container to the submission platform.

Details for steps 1 and 2 are given in the guide in the [singularity_example](singularity_example). Regarding step 3, each participating team will be provided a gitlab project where they can upload their submission. A few simple steps are necessary for that:

1. Register for the challenge as described on the [challenge website](https://fets-ai.github.io/Challenge/) (if not already done).
2. Sign up at [https://gitlab.hzdr.de/](https://gitlab.hzdr.de/) **using the same email address as in step 1** by either clicking *Helmholtz AAI* (login via your institutional email) or via your github login. Both buttons are in the lower box on the right.
3. Send an email to [challenge@fets.ai](mailto:challenge@fets.ai), asking for a Task 2-gitlab project and stating your gitlab handle (@your-handle) and team name. We will create a project for you and invite you to it within a day.
4. Follow the instructions in the newly created project to make a submission.

To make sure that the containers submitted by the participants also run successfully on the remote institutions in the FeTS federation, we offer functionality tests on a few toy cases. Details are provided in the gitlab project.
- [MLCube (docker) template](https://github.com/mlcommons/mlcube_examples/tree/master/fets/model) (coming soon): This is a guide how to build a container submission. For more details on how to submit to task 2 of the FeTS challenge 2022, see the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
- A [script](scripts/generate_toy_test_cases.py) to extract "toy test cases" from the official training data. These can be used for testing the reproducibility of your segmentation performance in functionality tests prior to the final submission. More details on the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617255).
- Code that is used to compute the final [ranking](ranking)

## Requirements
Singularity has to be installed to create a container submission [(instructions)](https://sylabs.io/guides/3.7/user-guide/quick_start.html#quick-installation-steps).

Python 3.6 or higher is required to run the scripts in `scripts`. Make sure to install the requirements (e.g. `pip install -r requirements.txt`), preferably in a virtual/conda environment.

The examples in this repo assume the following data folder structure, which will also be present at test-time:
```
data/ # this should be passed for inference
└───Patient_001 # case identifier
│ │ Patient_001_brain_t1.nii.gz
│ │ Patient_001_brain_t1ce.nii.gz
│ │ Patient_001_brain_t2.nii.gz
│ │ Patient_001_brain_flair.nii.gz
└───Pat_JohnDoe # other case identifier
│ │ ...
```
Furthermore, predictions for test cases should be placed in an output directory and named like this: `<case-identifier>_seg.nii.gz`

## Test your container

Once you have built your container, you can run the testing script as follows:

```bash
python scripts/test_container.py container.sif -i /path/to/data [-o /path/to/output_dir -l /path/to/label_dir]
```

This will run the container on the data in the input folder (`-i`), which should be formatted as described in the [requirements](#requirements), and save the outputs in the output folder (`-o`); without the latter option, outputs will be deleted at the end. This script will also report the execution time and do a sanity check on your outputs, so that you are warned if something is not as it should be. To test the functionality and runtime of your container on a standardized setup, please make a submission via gitlab, as described in the first section. From gitlab you can also get a small reference dataset with the correct folder structure and naming.

If labels are provided, this script also computes metrics for each test case and saves them in the output folder. **Note**, however, that these metrics are just for sanity checks and will be computed differently during the testing phase. Specifically, the [CaPTk library](https://cbica.github.io/CaPTk/BraTS_Metrics.html) will be used in the test phase evaluation. If you would like to try it on your predictions, please refer to their website for installation and usage instructions.
In order to run the `generate_toy_test_cases.py` script, you need the official [challenge training data](https://www.synapse.org/#!Synapse:syn28546456/wiki/617246). Also, Python 3.6 or higher is required.

## Submission Rules
In the testing phase of Task 2, we are going to perform a federated evaluation on multiple remote institutions with limited computation capabilities. To finish the evaluation before the MICCAI conference, we have to restrict the inference time of the submitted algorithms. As the number of participants is not known in advance, we decided for the following rules in that regard:
- For each final submission, we are going to check the validity of the algorithm output and measure the execution time of the container on a small dataset using a pre-defined Hardware setup (CPU: E5-2620 v4, GPU: RTX 2080 Ti 10.7GB, RAM: 40GB).
- Each submission is given **180 seconds per case** to produce a prediction (we will check only the total runtime for all cases, though). Submissions that fail to predict all cases within this time budget will not be included in the federated evaluation.
- If the number of participants is extremely high, we reserve the right to limit the number of participants in the final MICCAI ranking in the following way: Algorithms will be evaluated on the federated test set in the chronological order they were submitted in. This means the later an algorithm is submitted, the higher is the risk it cannot be evaluated on all federated test sets before the end of the testing phase. Note that this is a worst-case rule and we will work hard to include every single valid submission in the ranking.
The ranking code requirements are described [here](ranking).
47 changes: 47 additions & 0 deletions Task_2/generate_toy_test_cases.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from argparse import ArgumentParser
from pathlib import Path
import shutil


TOY_CASE_IDS = ["FeTS2022_01151", "FeTS2022_00805", "FeTS2022_00311"]


def main():
parser = ArgumentParser(
usage="This script helps you extracting the toy test cases used for sanity checks in the FeTS challenge. "
"It assumes that you have downloaded the training data. "
"Running it should leave you with a folder containing the test cases in the expected format."
)
parser.add_argument(
"train_data_path",
type=str,
help="Path of the directory that contains your locally stored FeTS 2022 training data.",
)
parser.add_argument(
"output_path",
type=str,
help="Path of the directory where the extracted toy test cases should be stored.",
)
args = parser.parse_args()
train_data_path = Path(args.train_data_path)
output_path = Path(args.output_path)
output_path.mkdir(exist_ok=True)
if train_data_path == output_path:
raise ValueError(
"Please specify a different folder for output to avoid overwriting training data."
)

print(f"Copying {TOY_CASE_IDS} from {train_data_path} to {output_path}...")
for case_id in TOY_CASE_IDS:
# copy files to output dir with different name (_brain_)
output_case_dir = output_path / case_id
output_case_dir.mkdir(exist_ok=True)
for nifti in (train_data_path / case_id).iterdir():
if nifti.name.endswith(".nii.gz"):
suffix = nifti.name.split("_")[-1]
shutil.copy2(nifti, output_case_dir / f"{case_id}_brain_{suffix}")
print("Done.")


if __name__ == "__main__":
main()
17 changes: 12 additions & 5 deletions Task_2/ranking/compute_ranking.R
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@ if (length(args) == 0) {
}
data_path <- args[1]
output_dir <- "ranking_output"
# output_dir <- "ranking_output_C22_excluded"
if (! dir.exists(output_dir)) {
dir.create(output_dir)
}
Expand All @@ -348,7 +349,13 @@ for (path in data_files) {
# Institution i ----------------------------------------------------------
print(path)
institution_name <- unlist(strsplit(tail(unlist(strsplit(path, "/")), 1), "[.]"))[1]
# print(institution_name)
# if (institution_name == "C22_validation") {
# next
# print("skipping")
# }
data_fets_inst <- load_data(path)
data_fets_inst <- subset(data_fets_inst, algorithm != "baseline_nnunet2020") # not ranked

# plot dots- and boxplots
p_dice <- generate_dot_boxplots_per_institute(subset(data_fets_inst, metric=="Dice"), "Dice",
Expand All @@ -366,7 +373,7 @@ for (path in data_files) {
# For each region, the ranking is computed for the Dice and Hausdorff95 metrics
# Resulting in 6 rankings
print("... calculate rankings ... ...")
rankings <- calculate_all_rankings_per_institute(data_fets_inst, institution_name, ranking_method)
rankings <- calculate_all_rankings_per_institute(data_fets_inst, institution_name, ranking_method, report_dir=report_dir)

# Compute mean rank per algorithm for each institution --------------------
mean_rank_df <- calculate_mean_ranks_one_institute(rankings, data_fets_inst, institution_name)
Expand Down Expand Up @@ -421,12 +428,12 @@ write.csv(countSign, file = paste(output_dir, paste(file_name_significant_counts
# also sum up significance matrices
total_sign_matrix <- NULL
for (s in dataSignMatrices) {
ordered_s <- s$dummyTask[order(rownames(s$dummyTask)), order(colnames(s$dummyTask))]
if (is_null(total_sign_matrix)){
total_sign_matrix <- s
total_sign_matrix <- ordered_s
} else {
assertthat::are_equal(rownames(total_sign_matrix$dummyTask), rownames(s$dummyTask))
total_sign_matrix$dummyTask <- total_sign_matrix$dummyTask + s$dummyTask
total_sign_matrix <- total_sign_matrix + ordered_s
}
}
file_name <- paste("significant_matrix", ranking_method, sep="_")
write.csv(total_sign_matrix$dummyTask, file = paste(output_dir, paste(file_name, ".csv",sep=""), sep="/"))
write.csv(total_sign_matrix, file = paste(output_dir, paste(file_name, ".csv",sep=""), sep="/"))
2 changes: 1 addition & 1 deletion Task_2/ranking/readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Task 2 Ranking

This is an implementation of the ranking method described on the [challenge website](https://fets-ai.github.io/Challenge/participate/#task-2-evaluation-details). To run this on your computer, you need to install R and the challengeR toolkit, as described in their [repository](https://github.com/wiesenfa/challengeR/#installation). The script `compute_ranking.R` should be invoked by
This is an implementation of the ranking method described on the [challenge website](https://www.synapse.org/#!Synapse:syn28546456/wiki/617245). To run this on your computer, you need to install R and the challengeR toolkit, as described in their [repository](https://github.com/wiesenfa/challengeR/#installation). The script `compute_ranking.R` should be invoked by
```
Rscript compute_ranking.R data_path [report_save_dir]
```
Expand Down
15 changes: 0 additions & 15 deletions Task_2/requirements.txt

This file was deleted.

95 changes: 0 additions & 95 deletions Task_2/scripts/metric_evaluation.py

This file was deleted.

6 changes: 0 additions & 6 deletions Task_2/scripts/readme.md

This file was deleted.

Loading