<a href="https://colab.research.google.com/github/AIM-Harvard/aimi_alpha/blob/main/aimi/contrast_detection/notebooks/contrast_detection_mwe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **ModelHub DeepContrast - Deep Learning–based Detection of Intravenous Contrast Enhancement on CT Scans**

Minimal working example for cloud-based detection of contrast enhancement of CT scans using DICOM or NRRD files.

Please cite the following article if you use this code or pre-trained models:

Ye, Z., Qian, J.M., Hosny, A., Zeleznik, R., Plana, D., Likitlersuang, J., Zhang, Z., Mak, R.H., Aerts, H.J. and Kann, B.H., 2022. Deep Learning–based Detection of Intravenous Contrast Enhancement on CT Scans. Radiology: Artificial Intelligence, 4(3), p.e210285.

[https://doi.org/10.1148/ryai.210285](https://doi.org/10.1148/ryai.210285)

Original code:
[GitHub](https://github.com/AIM-Harvard/DeepContrast)

## **Environment Setup**

This demo notebook is intended to be run using a GPU.

To access a free GPU on Colab:
`Edit > Notebooks Settings`.

From the dropdown menu under `Hardware accelerator`, select `GPU`. Let's check the Colab instance is indeed equipped with a GPU.

In [None]:
import os
import sys
import shutil
import yaml
import time
import tqdm
import glob


# useful information
curr_dir = !pwd
curr_droid = !hostname
curr_pilot = !whoami

print(time.asctime(time.localtime()))

print("\nCurrent directory :", curr_dir[-1])
print("Hostname          :", curr_droid[-1])
print("Username          :", curr_pilot[-1])

print("Python version    :", sys.version.split('\n')[0])

Mon Sep 26 16:32:46 2022

Current directory : /content
Hostname          : 91659f39d74c
Username          : root
Python version    : 3.7.14 (default, Sep  8 2022, 00:06:44) 


The authentication to Google is necessary to run BigQuery queries.

Every operation throughout the whole notebook (BigQuery, fetching data from the IDC buckets) is completely free. The only thing that is needed in order to run the notebook is the set-up of a Google Cloud project. In order for the notebook to work as intended, you will need to specify the name of the project in the cell after the authentication one.

In [None]:
from google.colab import auth
auth.authenticate_user()

In [None]:
from google.colab import files
from google.cloud import storage
from google.cloud import bigquery as bq

# INSERT THE ID OF YOUR PROJECT HERE!
project_id = "aimihub-362516"

Throughout this Colab notebook, for image pre-processing we will use [Plastimatch](https://plastimatch.org), a reliable and open source software for image computation. We will be running Plastimatch using the simple [PyPlastimatch](https://github.com/AIM-Harvard/pyplastimatch/tree/main/pyplastimatch) python wrapper. 

In [None]:
%%capture
!apt install plastimatch

In [None]:
# Check plastimatch was correctly installed
!plastimatch --version

plastimatch version 1.7.0


---

Start by cloning the AIMI hub repository on the Colab instance.

The AIMI hub repository stores all the code we will use for pulling, preprocessing, processing, and postprocessing the data for this use case - as long as the others shared through AIMI hub.

In [None]:
!git clone https://github.com/AIM-Harvard/aimi_alpha.git aimi

Cloning into 'aimi'...
remote: Enumerating objects: 386, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (19/19), done.[K
remote: Total 386 (delta 9), reused 12 (delta 5), pack-reused 362[K
Receiving objects: 100% (386/386), 3.23 MiB | 25.04 MiB/s, done.
Resolving deltas: 100% (203/203), done.


To organise the DICOM data in a more common (and human-understandable) fashion after downloading those from the buckets, we will make use of [DICOMSort](https://github.com/pieper/dicomsort). 

DICOMSort is an open source tool for custom sorting and renaming of dicom files based on their specific DICOM tags. In our case, we will exploit DICOMSort to organise the DICOM data by `PatientID` and `Modality` - so that the final directory will look like the following:

```
data/raw/nsclc-radiomics/dicom/$PatientID
 └─── CT
       ├─── $SOPInstanceUID_slice0.dcm
       ├─── $SOPInstanceUID_slice1.dcm
       ├───  ...
       │
      RTSTRUCT 
       ├─── $SOPInstanceUID_RTSTRUCT.dcm
      SEG
       └─── $SOPInstanceUID_RTSEG.dcm

```

In [None]:
!git clone https://github.com/pieper/dicomsort dicomsort

Cloning into 'dicomsort'...
remote: Enumerating objects: 130, done.[K
remote: Counting objects:  25% (1/4)[Kremote: Counting objects:  50% (2/4)[Kremote: Counting objects:  75% (3/4)[Kremote: Counting objects: 100% (4/4)[Kremote: Counting objects: 100% (4/4), done.[K
remote: Compressing objects:  25% (1/4)[Kremote: Compressing objects:  50% (2/4)[Kremote: Compressing objects:  75% (3/4)[Kremote: Compressing objects: 100% (4/4)[Kremote: Compressing objects: 100% (4/4), done.[K
Receiving objects:   0% (1/130)   Receiving objects:   1% (2/130)   Receiving objects:   2% (3/130)   Receiving objects:   3% (4/130)   Receiving objects:   4% (6/130)   Receiving objects:   5% (7/130)   Receiving objects:   6% (8/130)   Receiving objects:   7% (10/130)   Receiving objects:   8% (11/130)   Receiving objects:   9% (12/130)   Receiving objects:  10% (13/130)   Receiving objects:  11% (15/130)   Receiving objects:  12% (16/130)   Receiving objects:  13% (17/130)   Rec

In [None]:
%%capture
!pip install git+https://github.com/pyplati/platipy.git
!pip install pyplastimatch nnunet ipywidgets

In [None]:
import shutil
import random

import json
import pprint
import numpy as np
import pandas as pd

import pydicom
import nibabel as nib
import seaborn as sns
import SimpleITK as sitk
import pyplastimatch as pypla

print("Python version               : ", sys.version.split('\n')[0])
print("Numpy version                : ", np.__version__)

Python version               :  3.7.14 (default, Sep  8 2022, 00:06:44) 
Numpy version                :  1.21.6


Provided everything was set up correctly, we can run the BigQuery query and get all the information we need to download the testing data from the IDC platform.

For this specific use case, we are going to be working with the NSCLC-Radiomics collection (Chest CT scans of lung cancer patients, with manual delineation of various organs at risk).

In [None]:
%%bigquery --project=$project_id cohort_df

SELECT
  DISTINCT(PatientID),
  collection_id,
  source_DOI,
  StudyInstanceUID,
  SeriesInstanceUID,
  SOPInstanceUID,
  gcs_url
FROM
  `bigquery-public-data.idc_v11.dicom_all` dicom_all
WHERE
  collection_id = "nsclc_radiomics"

In [None]:
# this works as intended only if the BQ query parses data from a single dataset
# if not, feel free to set the name manually!
dataset_name = cohort_df["collection_id"].values[0]

dataset_name

'nsclc_radiomics'

In [None]:
# create the directory tree
!mkdir -p models

!mkdir -p data/raw 
!mkdir -p data/raw/tmp data/raw/$dataset_name
!mkdir -p data/raw/$dataset_name/dicom

!mkdir -p data/processed
!mkdir -p data/processed/$dataset_name
!mkdir -p data/processed/$dataset_name/nrrd

!mkdir -p data/models/
!mkdir -p data/model_input/
!mkdir -p data/deepcontrast_output/

## **Parsing Cohort Information from BigQuery Tables**

We can check the various fields of the table we populated by running the BigQuery query.

This table will store one entry for each DICOM file in the dataset (therefore, expect thousands of rows!)

In [None]:
pat_id_list = sorted(list(set(cohort_df["PatientID"].values)))

print("Total number of unique Patient IDs:", len(pat_id_list))

display(cohort_df.info())

display(cohort_df.head())

Total number of unique Patient IDs: 127
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15678 entries, 0 to 15677
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PatientID          15678 non-null  object
 1   collection_id      15678 non-null  object
 2   source_DOI         15678 non-null  object
 3   StudyInstanceUID   15678 non-null  object
 4   SeriesInstanceUID  15678 non-null  object
 5   SOPInstanceUID     15678 non-null  object
 6   gcs_url            15678 non-null  object
dtypes: object(7)
memory usage: 857.5+ KB


None

Unnamed: 0,PatientID,collection_id,source_DOI,StudyInstanceUID,SeriesInstanceUID,SOPInstanceUID,gcs_url
0,LUNG1-002,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2037150038059966416957...,1.2.276.0.7230010.3.1.3.2323910823.11504.15972...,1.2.276.0.7230010.3.1.4.2323910823.11504.15972...,gs://idc-open-cr/eff917af-8a2a-42fe-9e12-22bce...
1,LUNG1-002,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2037150038059966416957...,1.3.6.1.4.1.32722.99.99.2329880015517990803358...,1.3.6.1.4.1.32722.99.99.1004190115743500844746...,gs://idc-open-cr/f8cbf725-621d-4e18-8326-41789...
2,LUNG1-002,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2037150038059966416957...,1.3.6.1.4.1.32722.99.99.2329880015517990803358...,1.3.6.1.4.1.32722.99.99.1031280376053401623619...,gs://idc-open-cr/c73b3d12-78b1-4456-9a88-91ba2...
3,LUNG1-002,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2037150038059966416957...,1.3.6.1.4.1.32722.99.99.2329880015517990803358...,1.3.6.1.4.1.32722.99.99.1075071405629330534974...,gs://idc-open-cr/48b4ae0a-6936-44b4-a6bd-27c92...
4,LUNG1-002,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2037150038059966416957...,1.3.6.1.4.1.32722.99.99.2329880015517990803358...,1.3.6.1.4.1.32722.99.99.1125363119759695902111...,gs://idc-open-cr/3c36a30a-630b-4183-b87d-8a238...


---

## **Set paths**

In [None]:
# FIXED PARAMETERS
data_base_path = "/content/data"
raw_base_path = "/content/data/raw/tmp"
sorted_base_path = os.path.join("/content/data/raw/", dataset_name, "dicom")
models_path = "/content/models"
model_output_folder = "/content/data/deepcontrast_output/"
processed_base_path = os.path.join("/content/data/processed/", dataset_name)
processed_nrrd_path = os.path.join(processed_base_path, "nrrd")

## **Download and pre-process a DICOM for a Single Patient**

In [None]:
import aimi.aimi as aimi
from aimi import general_utils as aimi_utils
from aimi import contrast_detection as aimi_model

The following cells run all the processing pipeline, from pre-processing to post-processing.

In [None]:
pat_id = random.choice(cohort_df["PatientID"].values)
pat_df = cohort_df[cohort_df["PatientID"] == pat_id].reset_index(drop = True)

display(pat_df.info())
#display(pat_df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 133 entries, 0 to 132
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PatientID          133 non-null    object
 1   collection_id      133 non-null    object
 2   source_DOI         133 non-null    object
 3   StudyInstanceUID   133 non-null    object
 4   SeriesInstanceUID  133 non-null    object
 5   SOPInstanceUID     133 non-null    object
 6   gcs_url            133 non-null    object
dtypes: object(7)
memory usage: 7.4+ KB


None

In [None]:
# init
print("Processing patient: %s"%(pat_id))
patient_df = cohort_df[cohort_df["PatientID"] == pat_id]

Processing patient: LUNG1-388


In [None]:
# data cross-loading
aimi_utils.gcs.download_patient_data(raw_base_path = raw_base_path,
                                     sorted_base_path = sorted_base_path,
                                     patient_df = patient_df,
                                     remove_raw = True)

Copying files from IDC buckets to /content/data/raw/tmp/LUNG1-388...
Done in 10.0177 seconds.

Sorting DICOM files...
Done in 1.51813 seconds.
Sorted DICOM data saved at: /content/data/raw/nsclc_radiomics/dicom/LUNG1-388
Removing un-sorted data at /content/data/raw/tmp/LUNG1-388...
... Done.


In [None]:
# DICOM CT to NRRD - required for the processing
aimi_utils.preprocessing.pypla_dicom_ct_to_nrrd(sorted_base_path = sorted_base_path,
                                                processed_nrrd_path = processed_nrrd_path,
                                                pat_id = pat_id, verbose = True)


Running 'plastimatch convert' with the specified arguments:
  --input /content/data/raw/nsclc_radiomics/dicom/LUNG1-388/CT
  --output-img /content/data/processed/nsclc_radiomics/nrrd/LUNG1-388/LUNG1-388_CT.nrrd
... Done.


## **Running the Analysis for a Single Patient**

The contrast detection works for chest CTs as well as Head/Neck CTs. Therefore, two different models where trained. Before we can proceed, we need to select which body part we want to process.

In [None]:
body_part = "Chest"  # "HeadNeck"

Download the corresponding pre-trained model.

In [None]:
if body_part == "Chest":
  model_url = "https://github.com/AIM-Harvard/DeepContrast/blob/main/models/EffNet_Chest.h5?raw=true"
  model_download_path = os.path.join(models_path,"EffNet_Chest.h5")
  !wget -O $model_download_path $model_url
if body_part == "HeadNeck":
  model_url = "https://github.com/AIM-Harvard/DeepContrast/blob/main/models/EffNet_HeadNeck.h5?raw=true"
  model_download_path = "/content/models/EffNet_HeadNeck.h5"
  !wget -O $model_download_path $model_url

--2022-09-26 16:25:57--  https://github.com/AIM-Harvard/DeepContrast/blob/main/models/EffNet_Chest.h5?raw=true
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/AIM-Harvard/DeepContrast/raw/main/models/EffNet_Chest.h5 [following]
--2022-09-26 16:25:57--  https://github.com/AIM-Harvard/DeepContrast/raw/main/models/EffNet_Chest.h5
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/AIM-Harvard/DeepContrast/main/models/EffNet_Chest.h5 [following]
--2022-09-26 16:25:57--  https://raw.githubusercontent.com/AIM-Harvard/DeepContrast/main/models/EffNet_Chest.h5
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.

Get the image file paths. As we only downloaded one image file, we will only process that. The DeepContrast pipeline is set up to process as many images as there are in one folder. If you want to process more images in one run, just add them to the input folder.

In [None]:
img_files = []
patient_ids = glob.glob(processed_nrrd_path + "/*")
for patient_id in patient_ids:
  patient_folder = os.path.join(processed_nrrd_path, patient_id)
  img_files += [x for x in glob.glob(patient_folder + "/*.nrrd")]

print(img_files)

['/content/data/processed/nsclc_radiomics/nrrd/LUNG1-388/LUNG1-388_CT.nrrd']


Image preprocessing step

In [None]:
# Data preprocessing
df_img, img_arr = aimi_model.utils.processing.data_prepro(body_part=body_part, img_files=img_files)

LUNG1-388_CT


Contrast prediction step. The output is a table of all processed images and the corresponding prediction.
Predicion = 0: Non-contrast scan
Predicion = 1: Contrast enhanced scan

In [None]:
# Model prediction
aimi_model.utils.processing.model_pred(
  body_part=body_part,
  save_csv=False,
  model_dir=models_path,
  out_dir=model_output_folder,
  df_img=df_img,
  img_arr=img_arr
  )

patient level pred:
          pat_id  predictions
0  LUNG1-388_CT            0
