Skip to content

Latest commit



214 lines (181 loc) · 10.5 KB

File metadata and controls

214 lines (181 loc) · 10.5 KB


This is the codebase for preprocessing data and training unitask models and multitasks models. The data is host at figshare with the following link:

  • The Process_data folder contains the code for reading the raw input images and linking with the three metadata files to create the labels for anatomical landmarks and lessons.
  • The Unet2+_base data folder contains the code for training unitask models and multitasks models with Unet2+ architecture
  • The ESFPNet_base data folder contains the code for training unitask models and multitasks models with Unet2+ architecture

Process data

Step 1: Utilize the script to_json_label_{Anatomical landmark, Lesions}.py for generating labels from datasets for both cancer and non-cancer types

The input for the script includes the following files:

  • annotation.json
  • labels.json
  • objects.json

The code generates two separate sets of labels for the tasks and categorizes the labels based on cancer and non-cancer types, then save the outputs in the following JSON files:

  • labels_Lung_lesions.json
  • labels_Anatomical_landmarks.json

Example scripts:

python  --data_annots ./data/Lung_cancer/annotation.json --data_objects ./data/Lung_cancer/objects.json --data_labels ./data/Lung_cancer/labels.json  --path_save ./data/Lung_cancer/labels_Anatomical_landmarks.json

Step 2: Execute the script to combine labels from both cancer and non-cancer cases for Lesions and Anatomical Landmarks tasks.

The script requires the following input JSON files:

  • labels_Lung_lesions.json (cancer)
  • labels_Lung_lesions.json (non-cancer)
  • labels_Anatomical_landmarks.json (cancer)
  • labels_Anatomical_landmarks.json (non-cancer)

The code merge the labels for both cancer and non-cancer cases for each task and save the combined outputs in the following JSON files:

  • labels_Lung_lesions_final.json
  • labels_Anatomical_landmarks_final.json

Example scripts:

python --data_json_labels_cancer ./data/Lung_cancer/labels_Lung_lesions.json --data_json_labels_non_cancer ./data/Non_lung_cancer/labels_Lung_lesions.json --path_save ./data/labels_Lung_lesions_final.json

Step 3: Run the script to convert annotations into ground truth images for both Lesions and Anatomical Landmarks tasks, considering cancer and non-cancer types.

The script requires the following inputs:

  • annotation.json
  • labels.json
  • objects.json
  • Type of tasks (specify either "lesions" or "anatomical landmarks")

Based on the specified task type, generate masks (ground truth) for image segmentations (both cancer and non-cancer cases)
Save the resulting masks as outputs, representing the ground truth for the segmentation of images.

|-- Lung_cancer
|   |-- imgs
|   |   |-- images
|   |-- masks_Lung_lesions                  
|   |   |-- masks
|   |-- masks_Anatomical_landmarks                 
|   |   |-- masks
|-- Non_lung_cancer
|   |-- imgs
|   |   |-- images
|   |-- masks_Lung_lesions                   
|   |   |-- masks
|   |-- masks_Anatomical_landmarks                  
|   |   |-- masks

Example scripts:

python --data_annots ./data/Lung_cancer/annotation.json --data_objects ./data/Lung_cancer/objects.json --data_labels ./data/Lung_cancer/labels.json --path_save ./data/Lung_cancer/masks_Lung_lesions --type label_Lesions

After all steps in the process data phase, your data structure looks like this:

|-- Lung_cancer
|   |-- imgs
|   |   |-- images
|   |-- masks_Lung_lesions                          <-- After 3rd step
|   |   |-- masks
|   |-- masks_Anatomical_landmarks                  <-- After 3rd step
|   |   |-- masks
|   |-- labels_Lung_lesions.json                    <-- After 1st step
|   |-- labels_Anatomical_landmarks.json            <-- After 1st step
|   |-- annotations.json
|   |-- objects.json
|   |-- labels.json
|-- Non_lung_cancer
|   |-- imgs
|   |   |-- images
|   |-- masks_Lung_lesions                          <-- After 3rd step
|   |   |-- masks
|   |-- masks_Anatomical_landmarks                  <-- After 3rd step
|   |   |-- masks
|   |-- labels_Lung_lesions.json                    <-- After 1st step
|   |-- labels_Anatomical_landmarks.json            <-- After 1st step
|   |-- annotations.json
|   |-- objects.json
|   |-- labels.json
|-- labels_Lung_lesions_final.json                  <-- After 2nd step
|-- labels_Anatomical_landmarks_final.json          <-- After 2nd step

Step 4: Execute the script to perform the dataset split for images and masks related to Anatomical Landmarks or Lung Lesions.

The script requires the following input parameters:

  • Labels JSON file (labels_Lung_lesions_final.json or labels_Anatomical_landmarks_final.json)
  • Folder containing cancer images (./Lung_cancer/imgs)
  • Folder containing cancer masks (./Lung_cancer/masks_Lung_lesions or ./Lung_cancer/masks_Anatomical_landmarks)
  • Folder containing non-cancer images (./Non_lung_cancer/imgs)
  • Folder containing non-cancer masks (./Non_lung_cancer/masks_Lung_lesions or ./Non_lung_cancer/masks_Anatomical_landmarks)

The dataset will be split into training, validation, and test sets. The code organizes the outputs into a "dataset" folder, which includes subfolders for train, val, and test. Each of these subfolders comprise two subdirectories: one for images and another for masks.

Example scripts:

python --label_json_path ./data/labels_Lung_lesions_final.json --path_cancer_imgs ./data/Lung_cancer/imgs --path_non_cancer_imgs ./data/Non_lung_cancer/imgs --path_cancer_masks ./data/Lung_cancer/masks_Lung_lesions --path_non_cancer_masks ./data/Non_lung_cancer/masks_Lung_lesions

Training with ESFPNet models

Pretrained Model

  • Download the pretrained Mixtransformer from this link: Pretrained Model
  • Put the pretrained models under "Pretrained" folder


Multitasks models

Use train_joint_{Anatomical_Landmarks or Lung_lesions} file to train joint model of ESFPNet baseline.
The script requires the following input parameters:

  • Label JSON file (labels_Lung_lesions_final.json or labels_Anatomical_landmarks_final.json)
  • Folder containing splitted dataset (./dataset/Anatomical_landmarks or ./dataset/Lung_lesions)

Example scripts:

python --label_json_path ./data/labels_Anatomical_landmarks_final.json --dataset ./dataset/Anatomical_landmarks

Unitask models


Use file to train segmentation model of ESFPNet baseline.
The script requires the following input parameters:

  • Label JSON file (labels_Lung_lesions_final.json or labels_Anatomical_landmarks_final.json)
  • Folder containing splitted dataset (./dataset/Anatomical_landmarks or ./dataset/Lung_lesions)
  • Task for saved model (Anatomical_landmarks or Lung_lesions) Example scripts:
 python --dataset ./dataset/Anatomical_landmarks  --task Anatomical_landmarks


Use train_clf_{Anatomical_Landmarks or Lung_lesions} file to train classification model of ESFPNet baseline.
The script requires the following input parameters:

  • Label JSON file (labels_Lung_lesions_final.json or labels_Anatomical_landmarks_final.json)
  • Folder containing splitted dataset (./dataset/Anatomical_landmarks or ./dataset/Lung_lesions)

Example scripts:

python --label_json_path ./data/labels_Anatomical_landmarks_final.json --dataset ./dataset/Anatomical_landmarks


Multitasks models

Use infer_joint_{Anatomical_Landmarks or Lung_lesions}.py file to perform inference on joint model of ESFPNet baseline.
The script requires the following input parameters:

  • Label JSON file (labels_Lung_lesions_final.json or labels_Anatomical_landmarks_final.json)
  • Folder containing images of test dataset (./dataset/Anatomical_landmarks/test/imgs or ./dataset/Lung_lesions/test/imgs)
  • Folder containing masks of test dataset (./dataset/Anatomical_landmarks/test/imgs or ./dataset/Lung_lesions/test/imgs)
  • Path to saved-model(./SaveModel/Anatomical_Landmarks_multimodel/,...)
  • Path to save output images (./output_dir)

Example scripts:

 python --label_json_path ./data/labels_Anatomical_landmarks_final.json --path_imgs_test ./dataset/Anatomical_landmarks/test/imgs --path_masks_test ./dataset/Anatomical_landmarks/test/masks --saved_model ./SaveModel/Anatomical_Landmarks_multimodel/ --log_dir ./output_dir

Unitask models


Use file to perform inference on segmentation model of ESFPNet baseline.
The script requires the following input parameters:

  • Folder containing images of test dataset (./dataset/Anatomical_landmarks/test/imgs or ./dataset/Lung_lesions/test/imgs)
  • Folder containing masks of test dataset (./dataset/Anatomical_landmarks/test/imgs or ./dataset/Lung_lesions/test/imgs)
  • Path to saved-model(./SaveModel/Anatomical_landmarks/,...)
  • Path to save output images (./output_dir)

Example scripts:

 python --path_imgs_test ./dataset/Anatomical_landmarks/test/imgs --path_masks_test ./dataset/Anatomical_landmarks/test/masks --saved_model ./SaveModel/Anatomical_landmarks/ --log_dir ./output_dir


Use infer_clf_{Anatomical_Landmarks or Lung_lesions}.py file to perform inference on classification model of ESFPNet baseline.
The script requires the following input parameters:

  • Label JSON file (labels_Lung_lesions_final.json or labels_Anatomical_landmarks_final.json)
  • Folder containing images of test dataset (./dataset/Anatomical_landmarks/test/imgs or ./dataset/Lung_lesions/test/imgs)
  • Folder containing masks of test dataset (./dataset/Anatomical_landmarks/test/imgs or ./dataset/Lung_lesions/test/imgs)
  • Path to saved-model(./SaveModel/Anatomical_landmarks/,...)

Example scripts:

 python --label_json_path ./data/labels_Anatomical_landmarks_final.json --path_imgs_test ./dataset/Anatomical_landmarks/test/imgs --path_masks_test ./dataset/Anatomical_landmarks/test/masks --saved_model ./SaveModel/Anatomical_landmarks/ 

Training with Unet2+ models

Using the same scripts as the ESFPNet based model.