<a href="https://colab.research.google.com/github/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/collections_demos/rms_mutation_prediction/RMS-Mutation-Prediction-Expert-Annotations_exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploration of RMS-Mutation-Prediction-Expert-Annotations collection: Navigating annotations using BigQuery

This tutorial is shared as part of the tutorials prepared by the Imaging Data Commons team and available at https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks.


**This is an advanced tutorial**. You will need to complete Google Cloud prerequisites, as discussed in the following cells, to follow it. This tutorial assumes you understand how DICOM SR annotations are organized, and gives examples of how to select images and annotations using Google BigQuery SQL interface. For a more basic tutorial about this collection that does not require Google Cloud prerequisites, please see [this tutorial](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/collections_demos/rms_mutation_prediction/RMS-Mutation-Prediction-Expert-Annotations_exploration.ipynb).

---

`RMS-Mutation-Prediction-Expert-Annotations` is collection available in the [NCI Imaging Data Commons (IDC)](https://portal.imaging.datacommons.cancer.gov) that contains expert annotations of tissue types for 95 patients of the digital pathology slide images in the `RMS-Mutation-Prediction` collection released earlier. You can learn more about this collection in the following dataset record:

> Bridge, C., Brown, G. T., Jung, H., Lisle, C., Clunie, D., Milewski, D., Liu, Y., Collins, J., Linardic, C. M., Hawkins, D. S., Venkatramani, R., Fedorov, A., & Khan, J. (2024). Expert annotations of the tissue types for the `RMS-Mutation-Prediction` microscopy images [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10462858

You can access this annotations collection in the IDC Portal using [this link](https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=RMS-Mutation-Prediction-Expert-Annotations), or you can explore its content using this [custom Google Looker dashboard](https://tinyurl.com/idc-rms-annotations).

As is the case with all of the content of IDC, both the images and annotations are publicly available and are free to download!

In this notebook we give you an overview of this collection, and demonstrate how to navigate its content programmatically.

---

If you have any questions about this tutorial, please ask them on IDC forum: https://discourse.canceridc.dev

---

Initial version: June 2024


# Advanced topic: Querying annotations using BigQuery

IDC maintains BigQuery tables that contian all of the metadata available in IDC's DICOM files. All of that metadata is searchable in Google BigQuery tables. With BigQuery search, you do not need to download anything if all you need to do is examine the metadata.

If you would like to use BigQuery, you will need to complete the advanced prerequisites in [part 1](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part1_prerequisites.ipynb) of the "Getting started" tutorial series before running the following cells. You can also check out [part 3](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part3_exploring_cohorts.ipynb) of that series to get started with the IDC BigQuery content.
In the following cell we query DICOM metadata to get information about the ROI type for the annotations in the `RMS-Mutation-Prediction-Expert-Annotations` collection.

In [None]:
#@title Enter your Google Cloud Project ID here
my_ProjectID = "Please enter you project ID here" #@param {type:"string"}
os.environ["GCP_PROJECT_ID"] = my_ProjectID

from google.colab import auth
auth.authenticate_user()

The following query retrieves the list of all annotated regions available, for each DICOM study.

In [None]:
from google.cloud import bigquery

# BigQuery client is initialized with the ID of the project we specified in the cell above!
bq_client = bigquery.Client(my_ProjectID)

selection_query = """
SELECT
  PatientID,
  StudyInstanceUID,
  contentSequenceUnnested3.ConceptCodeSequence[SAFE_OFFSET(0)].CodeMeaning
FROM
  `bigquery-public-data.idc_current.dicom_all` AS dicom_all
CROSS JOIN
  UNNEST(ContentSequence) AS contentSequenceUnnested
CROSS JOIN
  UNNEST(contentSequenceUnnested.ContentSequence) AS contentSequenceUnnested2
CROSS JOIN
  UNNEST(contentSequenceUnnested2.ContentSequence) AS contentSequenceUnnested3
WHERE
  dicom_all.analysis_result_id = "RMS-Mutation-Prediction-Expert-Annotations"
  AND contentSequenceUnnested3.ConceptNameCodeSequence[SAFE_OFFSET(0)].CodeMeaning = "Finding"
"""

selection_result = bq_client.query(selection_query)
selection_df = selection_result.result().to_dataframe()

display(selection_df)

Unnamed: 0,PatientID,StudyInstanceUID,CodeMeaning
0,RMS2400,2.25.136698327400450893837131938791757812545,Necrosis
1,RMS2400,2.25.136698327400450893837131938791757812545,Connective tissue
2,RMS2400,2.25.136698327400450893837131938791757812545,Embryonal rhabdomyosarcoma
3,RMS2270,2.25.124251988371010685434513158523091145860,Connective tissue
4,RMS2270,2.25.124251988371010685434513158523091145860,Connective tissue
...,...,...,...
675,RMS2423,2.25.56148336459229922868266898022297146711,Connective tissue
676,RMS2423,2.25.56148336459229922868266898022297146711,Connective tissue
677,RMS2406,2.25.5360555849781855019773810600059868899,Alveolar rhabdomyosarcoma
678,RMS2406,2.25.5360555849781855019773810600059868899,Alveolar rhabdomyosarcoma


Next query generates a summary table of the annotated regions of interest along with the references to the slides they accompany, and the clinical data available for this collection. The result of this query drives the Google Looker dashboard available here: https://tinyurl.com/idc-rms-dashboard.

In [None]:
from google.cloud import bigquery

bq_client = bigquery.Client(my_ProjectID)

selection_query = """
WITH
  annotations_details AS (
  SELECT
    dicom_all.SeriesInstanceUID,
    CurrentRequestedProcedureEvidenceSequence[SAFE_OFFSET(0)].ReferencedSeriesSequence[SAFE_OFFSET(0)].SeriesInstanceUID AS annotated_SeriesInstanceUID,
    contentSequenceUnnested3.ConceptCodeSequence[SAFE_OFFSET(0)].CodeMeaning as segmented_ROI
  FROM
    `bigquery-public-data.idc_current.dicom_all` AS dicom_all
  CROSS JOIN
    UNNEST(ContentSequence) AS contentSequenceUnnested
  CROSS JOIN
    UNNEST(contentSequenceUnnested.ContentSequence) AS contentSequenceUnnested2
  CROSS JOIN
    UNNEST(contentSequenceUnnested2.ContentSequence) AS contentSequenceUnnested3
  WHERE
    dicom_all.analysis_result_id = "RMS-Mutation-Prediction-Expert-Annotations"
    AND (contentSequenceUnnested3.ConceptNameCodeSequence[SAFE_OFFSET(0)].CodeMeaning = "Finding")),
  rms_slides AS (
  SELECT
    DISTINCT(sm_metadata.SeriesInstanceUID) AS SeriesInstanceUID,
    dicom_all.PatientID,
    LEFT(dicom_all.PatientAge, LENGTH(dicom_all.PatientAge) - 1) as PatientAge,
    dicom_all.StudyInstanceUID,
    sm_metadata.* EXCEPT (SeriesInstanceUID)
  FROM
    `bigquery-public-data.idc_current.dicom_metadata_curated_series_level` AS sm_metadata
  INNER JOIN
    `bigquery-public-data.idc_current.dicom_all` AS dicom_all
  ON
    sm_metadata.SeriesInstanceUID = dicom_all.SeriesInstanceUID
  WHERE
    dicom_all.collection_id = "rms_mutation_prediction"
    AND dicom_all.Modality = "SM"
  ORDER BY
    sm_metadata.SeriesInstanceUID
    )
SELECT
  rms_slides.*,
  sample.* EXCEPT (dicom_patient_id,
    participantparticipant_id),
  diagnosis.* EXCEPT (dicom_patient_id,
    participantparticipant_id),
  demographics.* EXCEPT (dicom_patient_id),
  annotations_details.* EXCEPT (SeriesInstanceUID),
  annotations_details.SeriesInstanceUID AS annotation_SeriesInstanceUID
FROM
  rms_slides
LEFT OUTER JOIN
  annotations_details
ON
  rms_slides.SeriesInstanceUID = annotations_details.annotated_SeriesInstanceUID
JOIN
  `bigquery-public-data.idc_current_clinical.rms_mutation_prediction_sample` AS sample
ON
  rms_slides.PatientID = sample.dicom_patient_id
JOIN
  `bigquery-public-data.idc_current_clinical.rms_mutation_prediction_diagnosis` AS diagnosis
ON
  rms_slides.PatientID = diagnosis.dicom_patient_id
JOIN
  `bigquery-public-data.idc_current_clinical.rms_mutation_prediction_demographics` AS demographics
ON
  rms_slides.PatientID = demographics.dicom_patient_id
ORDER BY
  rms_slides.SeriesInstanceUID
"""

selection_result = bq_client.query(selection_query)
selection_df = selection_result.result().to_dataframe()

display(selection_df)

## Next steps

Share your feedback or ask questions about this notebook in IDC Forum: https://discourse.canceridc.dev.