<a href="https://colab.research.google.com/github/AIM-Harvard/alpha_aime/blob/main/aime/nnunet_pancreas/notebooks/idc_nnunet_pancreas_mwe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **nnU-Net for Pancreas and Pancreatic Cancer Segmentation**

Minimal Working Example for cloud-based analysis of data using the nnU-Net pancreas and pancreatic cancer segmentation model.

Ignore the comments - need to be updated!

Also, the test repo can be found at: [https://github.com/AIM-Harvard/alpha_aime](https://github.com/AIM-Harvard/alpha_aime)

## **Environment Setup**

This demo notebook is intended to be run using a GPU.

To access a free GPU on Colab:
`Edit > Notebooks Settings`.

From the dropdown menu under `Hardware accelerator`, select `GPU`. Let's check the Colab instance is indeed equipped with a GPU.

In [None]:
import os
import sys

import yaml

import time
import tqdm


# useful information
curr_dir = !pwd
curr_droid = !hostname
curr_pilot = !whoami

print(time.asctime(time.localtime()))

print("\nCurrent directory :", curr_dir[-1])
print("Hostname          :", curr_droid[-1])
print("Username          :", curr_pilot[-1])

print("Python version    :", sys.version.split('\n')[0])

Fri Sep  9 15:14:40 2022

Current directory : /content
Hostname          : 7e43c1fd07cf
Username          : root
Python version    : 3.7.13 (default, Apr 24 2022, 01:04:09) 


The authentication to Google is necessary to run BigQuery queries.

Every operation throughout the whole notebook (BigQuery, fetching data from the IDC buckets) is completely free. The only thing that is needed in order to run the notebook is the set-up of a Google Cloud project. In order for the notebook to work as intended, you will need to specify the name of the project in the cell after the authentication one.

In [None]:
from google.colab import auth
auth.authenticate_user()

In [None]:
from google.cloud import storage
from google.cloud import bigquery as bq

# INSERT THE NAME OF YOUR PROJECT HERE!
#project_name = "idc-sandbox-000"
project_name = "sage-buttress-323909"

Throughout this Colab notebook, for image pre-processing we will use [Plastimatch](https://plastimatch.org), a reliable and open source software for image computation. We will be running Plastimatch using the simple [PyPlastimatch](https://github.com/AIM-Harvard/pyplastimatch/tree/main/pyplastimatch) python wrapper. 

In [None]:
%%capture
!apt install plastimatch

In [None]:
# check plastimatch was correctly installed
!plastimatch --version

plastimatch version 1.7.0


We will use subversion to clone only a few subdirectories of a repository (this is still not simple to do using the git CLI).

In [None]:
%%capture
!apt install subversion

In [None]:
# check plastimatch was correctly installed
!svn --version | head -n 2

svn, version 1.9.7 (r1800392)
   compiled May 21 2022, 07:24:25 on x86_64-pc-linux-gnu


Clone only the subfolders of `ImagingDataCommons/ai_medima_misc` we need to run this notebook.

In [None]:
!svn checkout https://github.com/ImagingDataCommons/ai_medima_misc/trunk/nnunet/src
!svn checkout https://github.com/ImagingDataCommons/ai_medima_misc/trunk/nnunet/data

A    src/README.md
A    src/utils
A    src/utils/eval.py
A    src/utils/gcs.py
A    src/utils/postprocessing.py
A    src/utils/preprocessing.py
A    src/utils/processing.py
Checked out revision 51.
A    data/dicomseg_base_metadata.json
A    data/dicomseg_metadata.json
A    data/features_SR.json
A    data/nnunet_segments_code_mapping.csv
A    data/nnunet_shape_features_code_mapping.csv
Checked out revision 51.


Furthermore, to organise the DICOM data in a more common (and human-understandable) fashion after downloading those from the buckets, we will make use of [DICOMSort](https://github.com/pieper/dicomsort). 

DICOMSort is an open source tool for custom sorting and renaming of dicom files based on their specific DICOM tags. In our case, we will exploit DICOMSort to organise the DICOM data by `PatientID` and `Modality` - so that the final directory will look like the following:

```
data/raw/nsclc-radiomics/dicom/$PatientID
 └─── CT
       ├─── $SOPInstanceUID_slice0.dcm
       ├─── $SOPInstanceUID_slice1.dcm
       ├───  ...
       │
      RTSTRUCT 
       ├─── $SOPInstanceUID_RTSTRUCT.dcm
      SEG
       └─── $SOPInstanceUID_RTSEG.dcm

```

In [None]:
!mkdir -p src

!git clone https://github.com/pieper/dicomsort src/dicomsort

Cloning into 'src/dicomsort'...
remote: Enumerating objects: 130, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Compressing objects: 100% (4/4), done.[K
remote: Total 130 (delta 0), reused 1 (delta 0), pack-reused 126[K
Receiving objects: 100% (130/130), 44.12 KiB | 2.59 MiB/s, done.
Resolving deltas: 100% (63/63), done.


Finally, we will use DCMQI for converting the resulting segmentation into standard DICOM SEG objects.

In [None]:
dcmqi_release_url = "https://github.com/QIICR/dcmqi/releases/download/v1.2.4/dcmqi-1.2.4-linux.tar.gz"
dcmqi_download_path = "/content/dcmqi-1.2.4-linux.tar.gz"
dcmqi_path = "/content/dcmqi-1.2.4-linux"

!wget -O $dcmqi_download_path $dcmqi_release_url

!tar -xvf $dcmqi_download_path

!mv $dcmqi_path/bin/* /bin

--2022-09-09 15:15:22--  https://github.com/QIICR/dcmqi/releases/download/v1.2.4/dcmqi-1.2.4-linux.tar.gz
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/50675718/04f07880-81ee-11eb-92ec-30c7426dae5d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220909%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220909T151522Z&X-Amz-Expires=300&X-Amz-Signature=c33e8c03197777c9ff4023e288151207a2a8b6b6b49fa2dc81698c1a787593b1&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=50675718&response-content-disposition=attachment%3B%20filename%3Ddcmqi-1.2.4-linux.tar.gz&response-content-type=application%2Foctet-stream [following]
--2022-09-09 15:15:22--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/50675718/04f07880-81ee-11eb-92ec-30c7426dae5

---

In [None]:
%%capture
!pip install pyplastimatch nnunet ipywidgets

In [None]:
import shutil
import random

import json
import pprint
import numpy as np
import pandas as pd

import pydicom
import nibabel as nib
import SimpleITK as sitk
import pyplastimatch as pypla

print("Python version               : ", sys.version.split('\n')[0])
print("Numpy version                : ", np.__version__)

# ----------------------------------------

#everything that has to do with plotting goes here below
import matplotlib
matplotlib.use("agg")

import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from matplotlib.patches import Patch

%matplotlib inline
%config InlineBackend.figure_format = "png"

import ipywidgets as ipyw

## ----------------------------------------

# create new colormap appending the alpha channel to the selected one
# (so that we don't get a \"color overlay\" when plotting the segmask superimposed to the CT)
cmap = plt.cm.Reds
my_reds = cmap(np.arange(cmap.N))
my_reds[:, -1] = np.linspace(0, 1, cmap.N)
my_reds = ListedColormap(my_reds)

cmap = plt.cm.Greens
my_greens = cmap(np.arange(cmap.N))
my_greens[:, -1] = np.linspace(0, 1, cmap.N)
my_greens = ListedColormap(my_greens)

cmap = plt.cm.Blues
my_blues = cmap(np.arange(cmap.N))
my_blues[:, -1] = np.linspace(0, 1, cmap.N)
my_blues = ListedColormap(my_blues)

cmap = plt.cm.spring
my_spring = cmap(np.arange(cmap.N))
my_spring[:, -1] = np.linspace(0, 1, cmap.N)
my_spring = ListedColormap(my_spring)
## ----------------------------------------

import seaborn as sns

Python version               :  3.7.13 (default, Apr 24 2022, 01:04:09) 
Numpy version                :  1.21.6


Provided everything was set up correctly, we can run the BigQuery query and get all the information we need to download the testing data from the IDC platform.

For this specific use case, we are going to be working with the NSCLC-Radiomics collection (Chest CT scans of lung cancer patients, with manual delineation of various organs at risk).

In [None]:
%%bigquery --project=$project_name cohort_df

SELECT
  dicom_pivot_v10.PatientID,
  dicom_pivot_v10.collection_id,
  dicom_pivot_v10.source_DOI,
  dicom_pivot_v10.StudyInstanceUID,
  dicom_pivot_v10.SeriesInstanceUID,
  dicom_pivot_v10.SOPInstanceUID,
  dicom_pivot_v10.gcs_url
FROM
  `bigquery-public-data.idc_v10.dicom_pivot_v10` dicom_pivot_v10
WHERE
  StudyInstanceUID IN (
    SELECT
      StudyInstanceUID
    FROM
      `bigquery-public-data.idc_v10.dicom_pivot_v10` dicom_pivot_v10
    WHERE
      (
        LOWER(dicom_pivot_v10.collection_id) LIKE LOWER('pancreas_ct')
      )
    GROUP BY
      StudyInstanceUID
  )
GROUP BY
  dicom_pivot_v10.PatientID,
  dicom_pivot_v10.collection_id,
  dicom_pivot_v10.source_DOI,
  dicom_pivot_v10.StudyInstanceUID,
  dicom_pivot_v10.SeriesInstanceUID,
  dicom_pivot_v10.SOPInstanceUID,
  dicom_pivot_v10.gcs_url
ORDER BY
  dicom_pivot_v10.PatientID ASC,
  dicom_pivot_v10.collection_id ASC,
  dicom_pivot_v10.source_DOI ASC,
  dicom_pivot_v10.StudyInstanceUID ASC,
  dicom_pivot_v10.SeriesInstanceUID ASC,
  dicom_pivot_v10.SOPInstanceUID ASC,
  dicom_pivot_v10.gcs_url ASC

In [None]:
# this works as intended only if the BQ query parses data from a single dataset
# if not, feel free to set the name manually!
dataset_name = cohort_df["collection_id"].values[0]

In [None]:
# create the directory tree
!mkdir -p data models output

!mkdir -p data/raw 
!mkdir -p data/raw/tmp data/raw/$dataset_name
!mkdir -p data/raw/$dataset_name/dicom

!mkdir -p data/processed
!mkdir -p data/processed/$dataset_name
!mkdir -p data/processed/$dataset_name/nrrd
!mkdir -p data/processed/$dataset_name/nii
!mkdir -p data/processed/$dataset_name/dicomseg

!mkdir -p data/model_input/
!mkdir -p data/nnunet_output/

Download the segmentation model(s) from Zenodo. This can either be very fast (2m or even less) or very slow (up to 10m), probably depending on the traffic on the Zenodo's end and other factors.

If the download is taking a long time, consider interrupting the celle execution and running the cell again.

In [None]:
seg_model_url = "https://zenodo.org/record/4003545/files/Task007_Pancreas.zip?download=1"
model_download_path = "/content/models/Task055_SegTHOR.zip"

!wget -O $model_download_path $seg_model_url

--2022-09-09 15:16:01--  https://zenodo.org/record/4003545/files/Task007_Pancreas.zip?download=1
Resolving zenodo.org (zenodo.org)... 188.184.117.155
Connecting to zenodo.org (zenodo.org)|188.184.117.155|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5002701016 (4.7G) [application/octet-stream]
Saving to: ‘/content/models/Task055_SegTHOR.zip’


2022-09-09 15:19:23 (23.8 MB/s) - ‘/content/models/Task055_SegTHOR.zip’ saved [5002701016/5002701016]



Initialize a few environment variables [...]

In [None]:
os.environ["RESULTS_FOLDER"] = "/content/data/nnunet_output/"
os.environ["WEIGHTS_FOLDER"] = "/content/data/nnunet_output/nnUNet"

In [None]:
%%capture
!nnUNet_install_pretrained_model_from_zip $model_download_path

## **Parsing Cohort Information from BigQuery Tables**

We can check the various fields of the table we populated by running the BigQuery query.

This table will store one entry for each DICOM file in the dataset (therefore, expect thousands of rows!)

In [None]:
pat_id_list = sorted(list(set(cohort_df["PatientID"].values)))

print("Total number of unique Patient IDs:", len(pat_id_list))

display(cohort_df.info())

display(cohort_df.head())

Total number of unique Patient IDs: 80
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18942 entries, 0 to 18941
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PatientID          18942 non-null  object
 1   collection_id      18942 non-null  object
 2   source_DOI         18942 non-null  object
 3   StudyInstanceUID   18942 non-null  object
 4   SeriesInstanceUID  18942 non-null  object
 5   SOPInstanceUID     18942 non-null  object
 6   gcs_url            18942 non-null  object
dtypes: object(7)
memory usage: 1.0+ MB


None

Unnamed: 0,PatientID,collection_id,source_DOI,StudyInstanceUID,SeriesInstanceUID,SOPInstanceUID,gcs_url
0,PANCREAS_0001,pancreas_ct,10.7937/K9/TCIA.2016.tNB1kqBU,1.2.826.0.1.3680043.2.1125.1.38381854871216336...,1.2.826.0.1.3680043.2.1125.1.68878959984837726...,1.2.826.0.1.3680043.2.1125.1.12371411096899351...,gs://public-datasets-idc/bb705d97-6ae7-4265-81...
1,PANCREAS_0001,pancreas_ct,10.7937/K9/TCIA.2016.tNB1kqBU,1.2.826.0.1.3680043.2.1125.1.38381854871216336...,1.2.826.0.1.3680043.2.1125.1.68878959984837726...,1.2.826.0.1.3680043.2.1125.1.12376389450984884...,gs://public-datasets-idc/65bd9a85-72ae-49b8-9c...
2,PANCREAS_0001,pancreas_ct,10.7937/K9/TCIA.2016.tNB1kqBU,1.2.826.0.1.3680043.2.1125.1.38381854871216336...,1.2.826.0.1.3680043.2.1125.1.68878959984837726...,1.2.826.0.1.3680043.2.1125.1.13438245366954035...,gs://public-datasets-idc/8dfc4047-afa7-4993-95...
3,PANCREAS_0001,pancreas_ct,10.7937/K9/TCIA.2016.tNB1kqBU,1.2.826.0.1.3680043.2.1125.1.38381854871216336...,1.2.826.0.1.3680043.2.1125.1.68878959984837726...,1.2.826.0.1.3680043.2.1125.1.14285033913934517...,gs://public-datasets-idc/e6fe842b-790d-4877-b3...
4,PANCREAS_0001,pancreas_ct,10.7937/K9/TCIA.2016.tNB1kqBU,1.2.826.0.1.3680043.2.1125.1.38381854871216336...,1.2.826.0.1.3680043.2.1125.1.68878959984837726...,1.2.826.0.1.3680043.2.1125.1.15108061926999795...,gs://public-datasets-idc/a84bafa7-5b62-4803-b6...


---

## **Set Run Parameters**

From this cell, we can configure the nnU-Net inference step - specifying, for instance, the type of model we want to run (among the four different models the framework provides), whether we want to use test time augmentation, or whether we want to export the soft probability maps of the segmentation masks.


In [None]:
# FIXED PARAMETERS
data_base_path = "/content/data"
raw_base_path = "/content/data/raw/tmp"
sorted_base_path = os.path.join("/content/data/raw/", dataset_name, "dicom")

processed_base_path = os.path.join("/content/data/processed/", dataset_name)
processed_nrrd_path = os.path.join(processed_base_path, "nrrd")
processed_nifti_path = os.path.join(processed_base_path, "nii")

processed_dicomseg_path = os.path.join(processed_base_path, "dicomseg")

model_input_folder = "/content/data/model_input/"
model_output_folder = "/content/data/nnunet_output/"

dicomseg_json_path = "/content/data/dicomseg_base_metadata.json"

# -----------------
# nnU-Net pipeline parameters

# choose from: "2d", "3d_lowres", "3d_fullres", "3d_cascade_fullres"
nnunet_model = "2d"
use_tta = False
export_prob_maps = False

---

## **Functions We Can Push to a Module Later**

After cloning the repo - 

The general utilities modules will be imported using the following syntax:

```
# make sure the __init__.py are properly set-up for this

import aime.general_utils as aime_utils

aime_utils.download_patient_data(...)

```


The model-specific modules will be imported using the following syntax:

```
# make sure the __init__.py are properly set-up for this

import aime.nnunet_pancreas as aime_model

aime_model.process_patient(...)
aime_model.pypla_postprocess(...)
```

In [None]:
def numpy_to_nrrd(model_output_folder, processed_nrrd_path, pat_id,
                  output_folder_name = "pred_softmax", output_dtype = "uint8",
                  structure_list = ["Background", "Pancreas",
                                    "Pancreatic_cancer"]):

  """
  Convert softmax probability maps to NRRD. For simplicity, the probability maps
  are converted by default to UInt8
  Arguments:
    model_output_folder : required - path to the folder where the inferred segmentation masks should be stored.
    processed_nrrd_path : required - path to the folder where the preprocessed NRRD data are stored.
    pat_id              : required - patient ID (used for naming purposes).
    output_folder_name  : optional - name of the subfolder under the patient directory 
                                     (under `processed_nrrd_path`) where the softmax NRRD
                                     files will be saved. Defaults to "pred_softmax".
    output_dtype        : optional - output data type. Data type float16 is not supported by the NRRD standard,
                                     so the choice should be between uint8, uint16 or float32. Please note this
                                     will greatly impact the size of the DICOM PM file that will be generated.
    structure_list      : optional - list of the structures whose probability maps are stored in the 
                                     first channel of the `.npz` file (output from the nnU-Net pipeline
                                     when `export_prob_maps` is set to True). Defaults to the structure
                                     list for the SegTHOR challenge (background = 0 included).
  Outputs:
    This function [...]
  """

  pred_softmax_fn = pat_id + ".npz"
  pred_softmax_path = os.path.join(model_output_folder, pred_softmax_fn)

  # parse NRRD file - we will make use of if to populate the header of the
  # NRRD mask we are going to get from the inferred segmentation mask
  ct_nrrd_path = os.path.join(processed_nrrd_path, pat_id, pat_id + "_CT.nrrd")
  sitk_ct = sitk.ReadImage(ct_nrrd_path)

  output_folder_path = os.path.join(processed_nrrd_path, pat_id, output_folder_name)
  
  if not os.path.exists(output_folder_path):
    os.mkdir(output_folder_path)

  pred_softmax_all = np.load(pred_softmax_path)["softmax"]

  # check if the model managed to segment the pancreatic cancer as well
  # if not, exclude it from the list
  has_cancer_seg = True if len(np.unique(pred_softmax_all)) > 2 else False

  if has_cancer_seg == False:
    structure_list = ["Background", "Pancreas"]

  for channel, structure in enumerate(structure_list):

    pred_softmax_segmask = pred_softmax_all[channel].astype(dtype = np.float32)

    assert(output_dtype in ["uint8", "uint16", "float32"])      

    if output_dtype == "float32":
      # no rescale needed - the values will be between 0 and 1
      # set SITK image dtype to Float32
      sitk_dtype = sitk.sitkFloat32

    elif output_dtype == "uint8":
      # rescale between 0 and 255, quantize
      pred_softmax_segmask = (255*pred_softmax_segmask).astype(np.int32)
      # set SITK image dtype to UInt8
      sitk_dtype = sitk.sitkUInt8

    elif output_dtype == "uint16":
      # rescale between 0 and 65536
      pred_softmax_segmask = (65536*pred_softmax_segmask).astype(np.int32)
      # set SITK image dtype to UInt16
      sitk_dtype = sitk.sitkUInt16
    
    pred_softmax_segmask_sitk = sitk.GetImageFromArray(pred_softmax_segmask)
    pred_softmax_segmask_sitk.CopyInformation(sitk_ct)
    pred_softmax_segmask_sitk = sitk.Cast(pred_softmax_segmask_sitk, sitk_dtype)

    output_fn = "%s.nrrd"%(structure)
    output_path = os.path.join(output_folder_path, output_fn)

    writer = sitk.ImageFileWriter()

    writer.UseCompressionOn()
    writer.SetFileName(output_path)
    writer.Execute(pred_softmax_segmask_sitk)

In [None]:
def pypla_nifti_to_nrrd(pred_nifti_path, processed_nrrd_path,
                        pat_id, verbose = True):
  
  """
  NIfTI to NRRD file conversion using PyPlastimatch. 
  Arguments:
    src_folder : required - path to the folder where the sorted data should be stored.
    dst_folder : required - path to the folder where the preprocessed NRRD data are stored
    pat_id     : required - patient ID (used for naming purposes).
  
  Returns:
    pred_nrrd_path - 
  Outputs:
    This function [...]
  """

  pred_nrrd_path = os.path.join(processed_nrrd_path, pat_id, pat_id + "_pred_pancreas.nrrd")
  log_file_path = os.path.join(processed_nrrd_path, pat_id, pat_id + "_pypla.log")
  
  # Inferred NIfTI segmask to NRRD
  convert_args_pred = {"input" : pred_nifti_path, 
                       "output-img" : pred_nrrd_path}

  pypla.convert(verbose = verbose,
                path_to_log_file = log_file_path,
                **convert_args_pred)
  
  return pred_nrrd_path

In [None]:
def pypla_postprocess(processed_nrrd_path, model_output_folder, pat_id):

  """
  Wrapper for NIfTI to NRRD file conversion using PyPlastimatch.
  Arguments:
    processed_nrrd_path  : required - path to the folder where the sorted data should be stored.
    model_output_folder  : required - path to the folder where the inferred segmentation masks should be stored.
    pat_id               : required - patient ID (used for naming purposes). 
  Outputs:
    This function [...]
  """

  pred_nifti_fn = pat_id + ".nii.gz"
  pred_nifti_path = os.path.join(model_output_folder, pred_nifti_fn)

  pred_nrrd_path = pypla_nifti_to_nrrd(pred_nifti_path = pred_nifti_path,
                                       processed_nrrd_path = processed_nrrd_path,
                                       pat_id = pat_id, verbose = True)

## **Running the Analysis for a Single Patient**

In [None]:
import src.utils.gcs as gcs
import src.utils.preprocessing as preprocessing
import src.utils.processing as processing
import src.utils.postprocessing as postprocessing

The following cell runs all the processing pipeline, from pre-processing to post-processing.

For the sake of simplicity, all the extra code was organised in scripts that are fully documented and can be found at [this GitHub repository](https://github.com/ImagingDataCommons/ai_medima_misc/tree/main/nnunet/src).

In [None]:
import subprocess

In [None]:
# maybe make some arguments kwargs and keep others the same?
def process_patient_nnunet(model_input_folder, model_output_folder, nnunet_model,
                           use_tta = False, export_prob_maps = False):

  """
  Infer the thoracic organs at risk segmentation maps using one of the nnU-Net models.
  Arguments:
    model_input_folder  : required - path to the folder where the data to be inferred should be stored.
    model_output_folder : required - path to the folder where the inferred segmentation masks will be stored.
    nnunet_model        : required - pre-trained nnU-Net model to use during the inference phase.
    use_tta             : optional - whether to use or not test time augmentation (TTA). Defaults to False.
    export_prob_maps    : optional - whether to export or not softmax probabilities. Defaults to False.
  Outputs:
    This function [...]
  """
  
  assert(nnunet_model in ["2d", "3d_lowres", "3d_fullres", "3d_cascade_fullres"])

  start_time = time.time()

  print("Running `nnUNet_predict` with `%s` model..."%(nnunet_model))

  pat_fn_list = sorted([f for f in os.listdir(model_input_folder) if ".nii.gz" in f])
  pat_fn_path = os.path.join(model_input_folder, pat_fn_list[-1])

  print("Processing file at %s..."%(pat_fn_path))

  # run the inference phase
  # note: this could also be done in a pythonic fashion by running
  #       `nnUNet/nnunet/inference/predict.py` - but it would require
  #       to set manually all the arguments that the user is not intended
  #       to fiddle with; so stick with the bash executable

  bash_command = list()
  bash_command += ["nnUNet_predict"]
  bash_command += ["--input_folder", "%s"%model_input_folder]
  bash_command += ["--output_folder", "%s"%model_output_folder]
  bash_command += ["--task_name", "Task007_Pancreas"]
  bash_command += ["--model", "%s"%nnunet_model]
  
  if use_tta == False:
    bash_command += ["--disable_tta"]
  
  if export_prob_maps == True:
    bash_command += ["--save_npz"]

  bash_return = subprocess.run(bash_command, check = True, text = True)

  elapsed = time.time() - start_time

  print("Done in %g seconds."%elapsed)

In [None]:
# sample patient - feel free to choose randomly!
pat_id = "PANCREAS_0001"

# -----------------
# init

print("Processing patient: %s"%(pat_id))

patient_df = cohort_df[cohort_df["PatientID"] == pat_id]

dicomseg_fn = pat_id + "_SEG.dcm"

input_nifti_fn = pat_id + "_0000.nii.gz"
input_nifti_path = os.path.join(model_input_folder, input_nifti_fn)

pred_nifti_fn = pat_id + ".nii.gz"
pred_nifti_path = os.path.join(model_output_folder, pred_nifti_fn)

pred_softmax_folder_name = "pred_softmax"
pred_softmax_folder_path = os.path.join(processed_nrrd_path, pat_id, pred_softmax_folder_name)

# -----------------
# cross-load the CT data from the IDC buckets, run the preprocessing

# data cross-loading
gcs.download_patient_data(raw_base_path = raw_base_path,
                          sorted_base_path = sorted_base_path,
                          patient_df = patient_df,
                          remove_raw = True)


# DICOM CT to NRRD - good to have for a number of reasons
preprocessing.pypla_dicom_ct_to_nrrd(sorted_base_path = sorted_base_path,
                                     processed_nrrd_path = processed_nrrd_path,
                                     pat_id = pat_id, verbose = True)

# -----------------
# DL-inference

# DICOM CT to NIfTI - required for the processing
preprocessing.pypla_dicom_ct_to_nifti(sorted_base_path = sorted_base_path,
                                      processed_nifti_path = processed_nifti_path,
                                      pat_id = pat_id, verbose = True)

# prepare the `model_input` folder for the inference phase
preprocessing.prep_input_data(processed_nifti_path = processed_nifti_path,
                              model_input_folder = model_input_folder,
                              pat_id = pat_id)

# run the DL-based prediction
process_patient_nnunet(model_input_folder = model_input_folder,
                       model_output_folder = model_output_folder, 
                       nnunet_model = nnunet_model, use_tta = use_tta,
                       export_prob_maps = export_prob_maps)

numpy_to_nrrd(model_output_folder = model_output_folder,
              processed_nrrd_path = processed_nrrd_path,
              pat_id = pat_id,
              output_folder_name = pred_softmax_folder_name)

# remove the NIfTI file the prediction was computed from
!rm $input_nifti_path

postprocessing.pypla_postprocess(processed_nrrd_path = processed_nrrd_path,
                                 model_output_folder = model_output_folder,
                                 pat_id = pat_id)

# FIXME: generate JSON for the new task
# will do asap - prioritizing setting up the repository and code structure right now!
"""
postprocessing.nrrd_to_dicomseg(sorted_base_path = sorted_base_path,
                                processed_base_path = processed_base_path,
                                dicomseg_json_path = dicomseg_json_path,
                                pat_id = pat_id)
"""

Processing patient: PANCREAS_0001
Copying files from IDC buckets to /content/data/raw/tmp/PANCREAS_0001...
Done in 16.759 seconds.

Sorting DICOM files...
Done in 1.21641 seconds.
Sorted DICOM data saved at: /content/data/raw/pancreas_ct/dicom/PANCREAS_0001
Removing un-sorted data at /content/data/raw/tmp/PANCREAS_0001...
... Done.

Running 'plastimatch convert' with the specified arguments:
  --input /content/data/raw/pancreas_ct/dicom/PANCREAS_0001/CT
  --output-img /content/data/processed/pancreas_ct/nrrd/PANCREAS_0001/PANCREAS_0001_CT.nrrd
... Done.

Running 'plastimatch convert' with the specified arguments:
  --input /content/data/raw/pancreas_ct/dicom/PANCREAS_0001/CT
  --output-img /content/data/processed/pancreas_ct/nii/PANCREAS_0001/PANCREAS_0001_CT.nii.gz
... Done.
Copying /content/data/processed/pancreas_ct/nii/PANCREAS_0001/PANCREAS_0001_CT.nii.gz
to /content/data/model_input/PANCREAS_0001_0000.nii.gz...
... Done.
Running `nnUNet_predict` with `2d` model...
Processing file

KeyboardInterrupt: ignored

---

In [None]:
from pyplastimatch.utils import viz as viz_utils

In [None]:
ct_nii_path = os.path.join("/content/data/processed/pancreas_ct/nii/", pat_id, pat_id + "_CT.nii.gz")
seg_nii_path = os.path.join("/content/data/nnunet_output/", pat_id + ".nii.gz")
softseg_nii_path = os.path.join("/content/data/nnunet_output/", pat_id + ".npz")

"""
alternative way of loading the resulting NIfTI files
nibabel can sometimes take better care of the orientation of the
converted/segmented images, but will orient the data differently by default
"""

#ct_nii = nib.load(ct_nii_path).dataobj
#seg_nii = nib.load(seg_nii_path).dataobj

ct_nii = sitk.GetArrayFromImage(sitk.ReadImage(ct_nii_path))
seg_nii = sitk.GetArrayFromImage(sitk.ReadImage(seg_nii_path))

In some cases, the nnU-Net model will fail in segmenting the pancreatic cancer from the CT scan. The voxels of the NIfTI volume resulting from the pipeline will therefore take only two values: 0, for background, and 1, for pancreas.

In the next cell, we can visualise the result using a simple widget - after taking care of the potential aforementioned exception.

In [None]:
# in some patients, no pancreatic cancer will be segmented
has_cancer_seg = True if len(np.unique(seg_nii)) > 2 else False

if has_cancer_seg:
  # class #1
  pancreas_nii = np.copy(seg_nii)
  pancreas_nii[pancreas_nii > 1] = 0

  # class #2
  cancer_nii = np.copy(seg_nii)
  cancer_nii[cancer_nii < 2] = 0
  cancer_nii[cancer_nii == 2] = 1

  _ = viz_utils.AxialSliceSegmaskViz(ct_volume = ct_nii,
                                     segmask_dict = {"pancreas" : pancreas_nii,
                                                     "cancer" : cancer_nii},
                                     segmask_cmap_dict = {"pancreas" : my_greens,
                                                          "cancer" : my_reds},
                                     dpi = 100, figsize = (8, 8),
                                     min_hu = -1024, max_hu = 1024)
else:
    _ = viz_utils.AxialSliceSegmaskViz(ct_volume = ct_nii, 
                                       segmask_dict = {"pancreas" : seg_nii},
                                       segmask_cmap_dict = {"pancreas" : my_greens},
                                       dpi = 100, figsize = (8, 8),
                                       min_hu = -1024, max_hu = 1024)

---

## **Data Download**