Copyright German Cancer Research Center (DKFZ) and contributors. Please make sure that your usage of this code is in compliance with its license.
The Radiological Interactive Benchchmark allows open-set interactive 2D or 3D segmentation methods to evaluate themselves fairly against other methods on the field of radiological images. Radioactive currently includes 6 interactive segmentation methods, spans ten datasets (including CT and MRI) with various anatomical and pathological targets.
Through this benchmark, we provide users with transparent results on what the best existing methods are and provide developers an extendable framework, allowing them to easily compare their newly developed models or prompting schemes against currently available methods.
- Activate virtualenv of choice (with e.g. python 3.12)
- Download RadioActive repository (clone or download and extract manually)
cd radioactive && pip install -e .(This will take a while to resolve dependencies -- A more constrained requirements file will be provided in the future)- Done.
To use the benchmark, three environment variables need to be set:
RADIOA_DATA_PATH- Datasets will be downloaded into this and preprocessed in it.RADIOA_MODEL_PATH- Model checkpoints will be stored hereRADIOA_RESULTS_PATH- Predictions and evaluation resulst will be located here.RADIOA_MITK_PATH- (Optional) The path for the MITK executable, if not set the benchmark will auto-download and use the downloaded binaries of MITK instead.
To use the benchmark three steps need to be conducted:
The datasets used in the benchmark can be downloaded using the following command:
python ./src/radioa/datasets_preprocessing/download_all_datasets.py
# or only download a subset of datasets
python ./src/radioa/datasets_preprocessing/download_all_datasets.py --datasets ms_flair hanseg # can be multipleRegarding selective downloads one can choose from:
["segrap", "hanseg", "ms_flair", "hntsmrg", "hcc_tace", "adrenal_acc", "rider_lung", "colorectal", "lnq", "pengwin"]
The dataset is often provided in a raw format, e.g. DICOMs which are not directly usable and can be a pain to deal with. To simplify things we provide preprocessing schemes that convert these directly to easier useable formats. The preprocessing can be done using the following commands.
python ./src/radioa/datasets_preprocessing/preprocess_datasets.py --datasets ms_flair hanseg # can be multipleor again any choice of datasets from the list below:
ms_flair, hanseg, hntsmrg, pengwin, segrap, lnq, colorectal, adrenal_acc, hcc_tace, rider_lung
Currently the majority of models require manual checkpoint downloading. Auto-downloading of checkpoints is a planned feature that will be included before the final release (so auto-download works only for SAM2) Required checkpoints are:
medsam_vit_b.pthMedSamsam_med3d_turbo.pthsamMed3D-Turbosam_med3d.pthsamMed3Dsam_vit_h_4b8939.pthSAMsam-med2d_b.pthSAMMED2DSegVol_v1.pthSegVol
Only the checkpoints for the models that are going to be used need to be downloaded.
The benchmark for the ms_flair dataset and the SAM model can be run using the following command.
python ./src/radioa/experiments_runner.py --config ./configs/static_prompt_SAMNORM_D1.yamlOther configs can also be selected, but this can serve as an exemplary command to understand the benchmarking process.
Predictions and results are dependent on each Dataset, Model, and Prompter combination.
All predictions are stored for each of these combinations for each case. These predictions are best to be inspected through MITK due to the possibility of predicted instance overlap.
Additionally the evaluation results are provided for each instance, semantic class or class in respective .csv or .json files in these prediction directories.
Automatic evaluation can be disabled in the config files if one wants to conduct only inspection or calculate other metrics.
