Segmentation is the most popular tasks in MICCAI 2020 challenges, including 15 out of 24 challenges. In this tutorial, we focus on the segmentation task in thyroid nodule segmentation and classification challenge (TN-SCUI 2020). In particular, we show how to use U-Net with 3 steps to achieve IoU 0.8093 on testing set, which is very close to the top 1 score (0.8254, 550+ participants) on the leaderboard.
The target is to segment thyroid nodules from ultrasound (US) images.
- The training set consists of 3644 images with
png
format (1641 benign cases and 2003 malignant cases). The annotations are binary images with value{0,255}
. - The testing set consists of 910 images where 400 images are randomly selected as validation set, and 510 images are used for final ranking.
Let's show two examples in the training set.
Our solution is based on U-Net with its great extension nnU-Net. Thanks to the out-of-the-box and flexible nature of nnU-Net, we can easily adapt the training set to the required dataset format of nnU-Net.
-
Ubuntu 16.04 or 18.04
-
Install PyTorch (1.3+)
-
Install Nvidia Apex.
git clone https://github.com/NVIDIA/apex cd apex pip install -v --no-cache-dir ./
-
Install nnU-Net
git clone https://github.com/MIC-DKFZ/nnUNet.git cd nnUNet pip install -e .
-
Set path in
nnUNet/nnunet/paths.py
# line 29: base = 'your path to store training data, e.g., ./MICCAI2020/nnUNetData' # line 30 preprocessing_output_dir = 'your path to store preprocessing data, e.g., ./MICCAI2020/nnUNetData/pre_data' # SSD is highly recommanded # line 31 network_training_output_dir_base = 'your path to save trained models, e.g., ./MICCAI2020/Models'
Create following folders
MICCAI2020/
├── nnUNetData
│ └── nnUNet_raw_data
│ └── Task600_Thyroid2D
│ └── imagesTr
│ └── imagesTs
│ └── labelsTr
├── Models
├── OriData
│ └── TNSCUI2020
│ └── TNSCUI2020_train # directly decompressing TNSCUI2020_train.rar
│ └── tnscui2020_testset # directly decompressing tnscui2020_testset.rar
nnU-Net is designed for 3D images with nifti
format, while the data format in thyroid nodule task is the 2D image with png
format. We can expand all the 2D images with an additional dimension and convert them to nifti
format with this code. Now, the files in MICCAI2020/nnUNetData/nnUNet_raw_data/Task600_Thyroid2D
are
MICCAI2020/
├── nnUNetData
│ └── nnUNet_raw_data
│ └── Task600_Thyroid2D
│ └── imagesTr
│ ├── 2_0000.nii.gz
│ ├── 4_0000.nii.gz
│ ├── ..._0000.nii.gz
│ └── imagesTs
│ ├── test_1_0000.nii.gz
│ ├── test_2_0000.nii.gz
│ ├── test_..._0000.nii.gz
│ └── labelsTr
│ ├── 2.nii.gz
│ ├── 4.nii.gz
│ ├── ....nii.gz
│ └── dataset.json # download from https://github.com/JunMa11/TNSCUI2020/blob/master/Task600_Thyroid2D/dataset.json
Open terminal and run
nnUNet_plan_and_preprocess -t 600 --verify_dataset_integrity
Data ready!
Next, we can train 2D U-Net models.
- Network hyperparameters
- Train five models for cross validation. Open terminal and run
nnUNet_train 2d nnUNetTrainerV2 Task600_Thyroid2D 0
nnUNet_train 2d nnUNetTrainerV2 Task600_Thyroid2D 1
nnUNet_train 2d nnUNetTrainerV2 Task600_Thyroid2D 2
nnUNet_train 2d nnUNetTrainerV2 Task600_Thyroid2D 3
nnUNet_train 2d nnUNetTrainerV2 Task600_Thyroid2D 4
The five trained models will be automatically saved in MICCAI2020/Models/nnUNet/2d/Task600_Thyroid2D/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0,1,2,3,4
MICCAI2020/
├── Models
│ └── nnUNet
│ └── 2d
│ └── Task600_Thyroid2D
│ └── nnUNetTrainerV2__nnUNetPlansv2.1
│ └── fold_0
│ ├── model_final_checkpoint.model
│ ├── model_final_checkpoint.model.pkl
│ └── fold_1
│ ├── model_final_checkpoint.model
│ ├── model_final_checkpoint.model.pkl
│ └── fold_2
│ ├── model_final_checkpoint.model
│ ├── model_final_checkpoint.model.pkl
│ └── fold_3
│ ├── model_final_checkpoint.model
│ ├── model_final_checkpoint.model.pkl
│ └── fold_4
│ ├── model_final_checkpoint.model
│ ├── model_final_checkpoint.model.pkl
│ └── plans.pkl #
Inferring testing set by five-model ensemble
nnUNet_predict -i path to MICCAI2020/nnUNetData/nnUNet_raw_data/Task600_Thyroid2D/imagesTs -o path to MICCAI2020/nnUNetData/nnUNet_raw_data/Task600_Thyroid2D/UNet_Submission_NII/ -t Task600_Thyroid2D -m 2d
The inference results are in
MICCAI2020/
├── nnUNetData
│ └── nnUNet_raw_data
│ └── Task600_Thyroid2D
│ └── UNet_Submission_NII
│ ├── test_1.nii.gz
│ ├── test_2.nii.gz
│ ├── test_....nii.gz
Then, we convert the nifti
files to PNG
format.
import nibabel as nib
from skimage import io
import os
join = os.path.join
seg_path = 'path to MICCAI2020/nnUNetData/nnUNet_raw_data/Task600_Thyroid2D/TestSet_NII_Results_NaiveUNet_Task600/'
save_path = 'path to MICCAI2020/OriData/TNSCUI2020/TestSet_PNG_Results_NaiveUNet_Task600/'
for i in range(1, 911):
seg = nib.load(join(seg_path, 'test_'+str(i)+'.nii')).get_fdata()
seg_2d = seg_data[:,:,1]
io.imsave(join(save_path, 'test_'+str(i)+'.PNG'), seg_2d)
The most exciting moment comes!
Zip the folder TestSet_PNG_Results_NaiveUNet_Task600
and submit it to the official portal.
The results obtain IoU 0.8093, which is very close to the Top 1 IoU 0.8254 on the leaderboard.
Two U-Nets are employed.
- Step 1. Train U-Net to segment thyroid nodules from original ultrasound images.
- Step 2. Crop the nodule region of interest (ROI).
- Step 3. Train the new U-Net based on ROI images.
This strategy obtains remarkable improvements during five-fold cross validation. However, the performance gains do not generalize to testing set.
- Step 1. Generate pseudo test set labels: using the baseline U-Net to segment the testing set.
- Step 2. Train a new U-Net based on test set images and the pseudo labels.
- Step 3. Finetune the trained U-Net on training set.
Fold | Naive U-Net | Cascaded U-Net | Pseudo label |
---|---|---|---|
0 | 0.8747 | 0.9132 | 0.8770 |
1 | 0.8873 | 0.9147 | 0.8882 |
2 | 0.8808 | 0.9123 | 0.8799 |
3 | 0.8701 | 0.9147 | 0.8771 |
4 | 0.8719 | 0.9105 | 0.8735 |
Trained models are publicly available at here.
When beginning with a medical image segmentation challenge, many participants would take U-Net as their first try. Currently, such trained baseline models are not widely shared between participants due to privacy considerations. This might be a great waste of time and energy.
In the future segmentation challenges, we hope the trained baseline models can be freely shared by the organizers and participants at the beginning of the challenge, which could reduce repeatedly training the same models.
We highly appreciate TN-SCUI 2020 organizers for the great challenge, and all authors of nnU-Net for the out-of-the-box method.