<a href="https://colab.research.google.com/github/bamf-health/aimi-prostate-mr/blob/idc-colab/prostate_mr_run_on_idc_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Run AI segmentation on the ProstateX collection.

Be sure to run this in a runtime with an attached GPU

Querying and Download the MR scans is based on the [IDC  tutorial cookbook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/cookbook.ipynb).

## Prerequisites
Please complete the prerequisites as described in this [documentation page](https://learn.canceridc.dev/introduction/getting-started-with-gcp).

Insert that project ID in the cell below.

In [7]:
#@title Enter your Project ID and authenticate with GCP
# initialize this variable with your Google Cloud Project ID!
my_ProjectID = ''

import os
os.environ["GCP_PROJECT_ID"] = my_ProjectID

from google.colab import auth
auth.authenticate_user()

import pandas as pd
import subprocess
from pathlib import Path
from tempfile import TemporaryDirectory
import shutil
from tqdm.auto import tqdm

Install the `s5cmd` tool for efficient manifest downloads

In [8]:
%%shell
VERSION="s5cmd_2.2.2_Linux-64bit"
wget -N https://github.com/peak/s5cmd/releases/download/v2.2.2/${VERSION}.tar.gz
tar zxf ${VERSION}.tar.gz
mv s5cmd /usr/bin

--2023-09-29 03:44:18--  https://github.com/peak/s5cmd/releases/download/v2.2.2/s5cmd_2.2.2_Linux-64bit.tar.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/73909333/e095ae85-9acf-4dcc-b744-128b3311849c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230929%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230929T034419Z&X-Amz-Expires=300&X-Amz-Signature=d487369aee880e1c2c92369537e3d6793c9241bcb0187dc7f81a049ed665afca&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=73909333&response-content-disposition=attachment%3B%20filename%3Ds5cmd_2.2.2_Linux-64bit.tar.gz&response-content-type=application%2Foctet-stream [following]
--2023-09-29 03:44:19--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/73909333/e095ae85-9acf-4dcc-b744-



Install dcm2niix for dicom conversion. Use the prebuild version because it was jpeg support.

In [9]:
%%shell
curl -fLO https://github.com/rordenlab/dcm2niix/releases/latest/download/dcm2niix_lnx.zip
unzip -o dcm2niix_lnx.zip -d /usr/bin

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  855k  100  855k    0     0  1281k      0 --:--:-- --:--:-- --:--:-- 1281k
Archive:  dcm2niix_lnx.zip
  inflating: /usr/bin/dcm2niix       




We can build a download manifest. To reproduce our results, get a list of the SeriesInstanceUIDs from the `qa-results.csv` file. This csv is in the prostate-mr.zip file at https://zenodo.org/record/8352041. Alternativly, we can get a copy directly from the github repo

In [10]:
%%shell
wget -N https://github.com/bamf-health/aimi-prostate-mr/raw/main/qa-results/qa-results.csv

--2023-09-29 03:44:20--  https://github.com/bamf-health/aimi-prostate-mr/raw/main/qa-results/qa-results.csv
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/bamf-health/aimi-prostate-mr/main/qa-results/qa-results.csv [following]
--2023-09-29 03:44:20--  https://raw.githubusercontent.com/bamf-health/aimi-prostate-mr/main/qa-results/qa-results.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104271 (102K) [text/plain]
Saving to: ‘qa-results.csv.2’


2023-09-29 03:44:20 (4.26 MB/s) - ‘qa-results.csv.2’ saved [104271/104271]





In [11]:
qa_df = pd.read_csv('qa-results.csv')
series_uids = qa_df.SeriesInstanceUID.tolist()

In [12]:
# python API is the most flexible way to query IDC BigQuery metadata tables
from google.cloud import bigquery
bq_client = bigquery.Client(my_ProjectID)

# enclose series_uids in quotes for use in sql query
series_uids = [f"'{x}'" for x in series_uids]

selection_query =f"""
SELECT
  # Organize the files in-place right after downloading
  ANY_VALUE(CONCAT("cp s3",REGEXP_SUBSTR(gcs_url, "(://.*)/"),"/* ",collection_id,"/",PatientID,"/",StudyInstanceUID,"/",SeriesInstanceUID)) AS s5cmd_command
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  SeriesInstanceUID IN ({','.join(series_uids)})
GROUP BY
  SeriesInstanceUID
"""

selection_result = bq_client.query(selection_query)
selection_df = selection_result.result().to_dataframe()

selection_df.to_csv("/content/s5cmd_gcp_manifest.txt", header=False, index=False)

NotFound: ignored

Download manifest with the `s5cmd` tool.

In [7]:
%%shell
# check if dicoms have already been downloaded
if test -n "$(find dcms -name '*.dcm' -print -quit)"
then
    echo "dicoms already downloaded"
else
  mkdir -p dcms
  cd dcms && s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com run ../s5cmd_gcp_manifest.txt
  cd -
fi

dicoms already downloaded




# Run model on ProstateX
You can run the model on the downloaded scans with the below code.

>If you want to run the model locally, this code is containerized in the project. Refer to the [readme](https://github.com/bamf-health/aimi-prostate-mr/tree/main#running-inference) for instructions on running the container locally.

Install `nnunet` python package

In [8]:
%%capture
!pip install nnunet

In [13]:
# setup nnunet paths
os.environ["nnUNet_raw_data_base"] ="/nnunet_data/nnUNet_raw_data_base/"
os.environ["nnUNet_preprocessed"] ="/nnunet_data/nnUNet_preprocessed/"
os.environ["RESULTS_FOLDER"] ="/nnunet_data/nnUNet_trained_models/"


download model weights from zenodo

In [14]:
%%shell
# check if weights exist before starting a large downloaded
if [ ! -f ${RESULTS_FOLDER}nnUNet/3d_fullres/Task788_ProstateX/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_final_checkpoint.model ]
then
    mkdir -p ${nnUNet_raw_data_base}
    mkdir -p ${nnUNet_preprocessed}
    mkdir -p ${RESULTS_FOLDER}nnUNet/
    echo "Downloading model weights"
    wget -N https://zenodo.org/record/8290093/files/Task788_Prostate.zip
    unzip Task788_Prostate.zip -d ${RESULTS_FOLDER}nnUNet/
else
    echo "model weights already downloaded"
fi

model weights already downloaded




Install binaries for `itkimage2segimage` executable

In [11]:
%%shell
# Install binaries for itkimage2segimage package
mkdir -p /app
PACKAGE_TAR="dcmqi-1.2.5-linux.tar.gz"
ITKIMAGE2SEGIMAGE_URL=https://github.com/QIICR/dcmqi/releases/download/v1.2.5/${PACKAGE_TAR}
wget -N ${ITKIMAGE2SEGIMAGE_URL} --no-check-certificate
tar -zxvf ${PACKAGE_TAR} -C /app
rm ${PACKAGE_TAR}

--2023-09-29 03:39:51--  https://github.com/QIICR/dcmqi/releases/download/v1.2.5/dcmqi-1.2.5-linux.tar.gz
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/50675718/79d3ad95-9f0c-42a4-a1c5-bf5a63461894?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230929%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230929T033951Z&X-Amz-Expires=300&X-Amz-Signature=247db87ead7687edc5fcbce880572eb0706aadfcae6fcdd99b77a96f601458bf&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=50675718&response-content-disposition=attachment%3B%20filename%3Ddcmqi-1.2.5-linux.tar.gz&response-content-type=application%2Foctet-stream [following]
--2023-09-29 03:39:51--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/50675718/79d3ad95-9f0c-42a4-a1c5-bf5a6346189



Run inference on scans.

First download  `run.py` and `ai-dicom-seg-meta.json` from the git repo

In [15]:
%%shell
wget -N https://github.com/bamf-health/aimi-prostate-mr/raw/main/container/app/src/run.py
wget -N https://github.com/bamf-health/aimi-prostate-mr/raw/main/container/app/dcm-meta/ai-dicom-seg-meta.json

--2023-09-29 03:44:46--  https://github.com/bamf-health/aimi-prostate-mr/raw/main/container/app/src/run.py
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/bamf-health/aimi-prostate-mr/main/container/app/src/run.py [following]
--2023-09-29 03:44:46--  https://raw.githubusercontent.com/bamf-health/aimi-prostate-mr/main/container/app/src/run.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7415 (7.2K) [text/plain]
Saving to: ‘run.py’


2023-09-29 03:44:46 (77.6 MB/s) - ‘run.py’ saved [7415/7415]

--2023-09-29 03:44:46--  https://github.com/bamf-health/aimi-prostate-mr/raw/main/container/app/d



Then run model on all downloaded dicom series

In [16]:
from run import main_dicom

In [17]:
dcm_dir = Path('dcms')
seg_dir = Path('preds')
seg_meta = Path('ai-dicom-seg-meta.json')

In [18]:
# for testing, just select a single series
testing = True
if testing:
  test_dcm_dir = Path('test_dcms')
  test_dcm_dir.mkdir(exist_ok=True)
  test_series_dir = test_dcm_dir/"1.3.6.1.4.1.14519.5.2.1.7311.5101.160028252338004527274326500702"
  if not test_series_dir.exists():
    shutil.copytree("dcms/prostatex/ProstateX-0000/1.3.6.1.4.1.14519.5.2.1.7311.5101.158323547117540061132729905711/1.3.6.1.4.1.14519.5.2.1.7311.5101.160028252338004527274326500702", test_dcm_dir/"1.3.6.1.4.1.14519.5.2.1.7311.5101.160028252338004527274326500702")
  dcm_dir = test_dcm_dir

In [19]:
main_dicom(dcm_dir, seg_dir, seg_meta)

KeyboardInterrupt: ignored

Download segmentations

In [None]:
%%shell
zip -r preds.zip preds

In [None]:
from google.colab import files
files.download('preds.zip')