<a href="https://colab.research.google.com/github/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/collections_demos/prostate-MRI_hiplot_experiments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploration of prostate MRI acquisition parameters

The goal of this notebook is to demonstrate how you can use NCI Imaging Data Commons in combination with the open source `hiplot` package to explore variations in Magnetic Resonance Imaging (MRI) protocol parameters across public collections.

[NCI Imaging Data Commons (IDC)](https://imaging.datacommons.cancer.gov) is a cloud-based environment containing publicly available cancer imaging data co-located with the analysis and exploration tools and resources. IDC is a node within the broader [NCI Cancer Research Data Commons (CRDC)](https://datacommons.cancer.gov/) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data.

If you are not familiar with IDC, we recommend you first take a look at the [Getting started](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started.ipynb) notebook that is intended to serve as the introduction into working with IDC programmatically.

If you have any questions about this tutorial, please post your questions on the [IDC user forum](https://discourse.canceridc.dev/) (preferred) or email IDC support at support@canceridc.dev!

You can find more IDC notebooks and tutorials here: https://github.com/ImagingDataCommons/IDC-Tutorials/tree/master.

--

Authored by Andrey Fedorov and Deepa Krishnaswamy

Initial version: February 2024

Updated: August 2024

## Prerequisites

In order to be able to run the cells in this notebook, you must complete the prerequisites to set up your Google Cloud Platform account, as shown in this tutorial: https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started/part1_prerequisites.ipynb.

Once you completed the prerequisites, insert your Google Cloud Platform project ID in the cell below.

In [None]:
#@title Enter your Project ID here and authenticate with Google
# initialize this variable with your Google Cloud Project ID!
my_ProjectID = "" #@param {type:"string"}

import os
os.environ["GCP_PROJECT_ID"] = my_ProjectID

from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
bq_client = bigquery.Client(my_ProjectID)

selection_query = """
SELECT collection_id
FROM bigquery-public-data.idc_current.original_collection_metadata
LIMIT 1
"""

try:
  selection_result = bq_client.query(selection_query)
except:
  print("Check you project ID - it does not seem to work!")


## Install `hiplot`

[`HiPlot`](https://github.com/facebookresearch/hiplot) is a lightweight interactive visualization tool to help AI researchers discover correlations and patterns in high-dimensional data using parallel plots and other graphical ways to represent information.

Installing this package is very easy!

In [None]:
%%capture
!pip install -U hiplot
import hiplot as hip

## Get relevant MR acquisition metadata

In the following cell we query some of the key acquisition parameters for the DICOM MR images of the prostate (identified based on the `PROSTATE` assigned to the `BodyPartExamined` attribute).

You can learn more about the DICOM attributes that describe MR acquisition from the DICOM standard [here](https://dicom.innolitics.com/ciods/mr-image), and experiment with adding more attributes to the query!

In [None]:
%%bigquery hiplot_df --project $my_ProjectID

WITH
  interesting_mr_stuff AS (
  SELECT
    SeriesInstanceUID,
    collection_id,
    SeriesDescription,

  # basic ad-hoc rules to determine series type
  #    experiment with those!
  CASE
    WHEN UPPER(SeriesDescription) LIKE '%T1%' THEN 'T1'
    WHEN UPPER(SeriesDescription) LIKE '%T2%' THEN 'T2'
    WHEN UPPER(SeriesDescription) LIKE '%DCE%' or UPPER(SeriesDescription) LIKE '%DYN%' or UPPER(SeriesDescription) LIKE '%GAD%' THEN 'DCE'
    WHEN UPPER(SeriesDescription) LIKE '%DWI%' or (UPPER(SeriesDescription) LIKE '%DIFF%' and UPPER(SeriesDescription) NOT LIKE '%APPA%') THEN 'DWI'
    WHEN UPPER(SeriesDescription) LIKE '%ADC%' or UPPER(SeriesDescription) LIKE '%APPARENT DIFF%' THEN 'ADC'
    WHEN UPPER(SeriesDescription) LIKE '%PD%' THEN 'PD'
    ELSE 'OTHER' END AS series_type,


    PatientID,
    StudyInstanceUID,
    EchoTime,
    InversionTime,
    EchoTrainLength,
    RepetitionTime,
    TriggerTime,
    FlipAngle,
    ARRAY_TO_STRING(SequenceVariant, "/") AS SequenceVariant,
    ARRAY_TO_STRING(ScanOptions, "/") AS ScanOptions,
    ARRAY_TO_STRING(ScanningSequence, "/") AS ScanningSequence,
    MRAcquisitionType,
    ARRAY_TO_STRING(ImageType, "/") AS ImageType,
    PixelSpacing[
  OFFSET
    (0)]AS PixelSpacing,
    SliceThickness,
    PhotometricInterpretation,
    ContrastBolusAgent,
    SequenceName,
    Manufacturer,
    ManufacturerModelName
  FROM
    `bigquery-public-data.idc_current.dicom_all`
  WHERE
    #collection_id IN ("qin_prostate_repeatability",
    #  "prostatex",
    #  "prostate_diagnosis",
    #  "prostate_3t")
    BodyPartExamined = "PROSTATE"
    AND Modality = "MR")
SELECT
  SeriesInstanceUID,
  any_value(collection_id) as collection_id,
  any_value(SeriesDescription) as SeriesDescription,
  any_value(series_type) as series_type,
  string_agg(distinct(EchoTime)) as EchoTimes,
  string_agg(distinct(InversionTime)) as InversionTimes,
  string_agg(distinct(RepetitionTime)) as RepetitionTimes,
  string_agg(distinct(FlipAngle)) as FlipAngle,
FROM
  interesting_mr_stuff
GROUP BY
  SeriesInstanceUID



## Create HiPlot!

The query above returns the result in a Pandas dataframe `hiplot_df` (as specified in the `%%bigquery` parameter).

In the following cell we take that dataframe and create a HiPlot visualization.

Note that if a given column of the dataframe has too many distinct values (more than 80 by default, see [this issue](https://github.com/facebookresearch/hiplot/issues/33)), it will not be rendered in the plot.


How to use HiPlot:
* select ranges of values in individual column axes or their combinations
* reset ranges by double-clicking on the axis
* if you mouse over column label in the plot and hit right mouse button, you can use the values in that specific column for coloring
* the data is automatically subsetted based on your selection, with the result shown in the table under the plot - you can use this feature to examine specific values of the columns

In [None]:
exp = hip.Experiment()
exp.from_dataframe(hiplot_df).display()

## Explore your data further with `idc-index`

[`idc-index`](https://github.com/ImagingDataCommons/idc-index) is a python package designed to simplify access to the data available from NCI Imaging Data Commons.

We will use this package to generate URLs to open individual studies/series directly in the notebook cell.

First we install the package.

In [None]:
!pip install --upgrade idc-index

Once installed, we need to instantiate `IDCClient`, which provides the various helper API endpoints.

In [None]:
from idc_index import IDCClient

c = IDCClient()

## Examine individual MR series

In the next cell you can enter `SeriesInstanceUID` values corresponding to the individual DICOM series from the HiPlot table above, and generate IDC viewer URLs to open either study or series from the convenience of this notebook.

In [None]:
#@title Enter any `SeriesInstanceUID` from the HiPlot table above
series_instance_uid = "1.3.6.1.4.1.14519.5.2.1.3983.4006.185971477634236436836567638064" #@param {type:"string"}

series_url = c.get_viewer_URL(seriesInstanceUID=series_instance_uid)
study_url = series_url.split("?")[0]

# view entire study
from IPython.display import IFrame
IFrame(study_url, width=1600, height=900)

In [None]:
# view selected series only
from IPython.display import IFrame
IFrame(series_url, width=1600, height=900)

## What's next?

You can find more IDC notebooks and tutorials here: https://github.com/ImagingDataCommons/IDC-Tutorials/tree/master.

You can contact IDC support by sending email to support@canceridc.dev or posting your question on [IDC User forum](https://discourse.canceridc.dev).

## Acknowledgments

Imaging Data Commons has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

If you use IDC in your research, please cite the following publication:

> Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. _National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence_. Radiographics 43, (2023). http://dx.doi.org/10.1148/rg.230180