diff --git a/notebooks/collections_demos/bmdeep_annotations_example.png b/notebooks/collections_demos/bmdeep_annotations_example.png new file mode 100644 index 0000000..6c21526 Binary files /dev/null and b/notebooks/collections_demos/bmdeep_annotations_example.png differ diff --git a/notebooks/collections_demos/bonemarrowwsi_pediatricleukemia.ipynb b/notebooks/collections_demos/bonemarrowwsi_pediatricleukemia.ipynb new file mode 100644 index 0000000..8b267dc --- /dev/null +++ b/notebooks/collections_demos/bonemarrowwsi_pediatricleukemia.ipynb @@ -0,0 +1,2077 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "view-in-github" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vBC86EhVsKql" + }, + "source": [ + "# BoneMarrowWSI-PediatricLeukemia\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jbLfOJ2OsRHU" + }, + "source": [ + "## Background\n", + "\n", + "This notebook introduces the `BoneMarrowWSI-PediatricLeukemia` collection, which is presented in [this preprint](https://www.arxiv.org/pdf/2509.15895) and was recently added to IDC.\n", + "\n", + "- **Images**: The `BoneMarrowWSI-PediatricLeukemia` dataset comprises bone marrow aspirate smear WSIs for 246 pediatric cases of leukemia, including acute lymphoid leukemia (ALL), acute myeloid leukemia (AML), and chronic myeloid leukemia (CML). The smears were prepared for the initial diagnosis (i.e., without prior treatment), stained in accordance with the Pappenheim method, and scanned at 40x magnification.\n", + "- **Annotations**: The images have been annotated with rectangular regions of interest (ROI) of the evaluable monolayer area and a total of 45176 cell bounding box annotations have been placed (with few exceptions) within the ROIs. For a subset of 232 ROIs all cells and other haematological structures have been labelled by multiple experts in a consensus labeling approach with 49 distinct (cell type) classes. The consensus labelling approach worked as follows: each bounding box was successively labelled by different experts in so-called \"annotation sessions\" until (a) the bounding box has been labelled by at least two experts, and (b) the most frequent label constitues at least half of all labels given to that bounding box (and is then termed \"consensus class\"). In summary, the following annotations are available: \n", + "\n", + " - For each slide: ROI annotations of the monolayer area for each slide\n", + " - For some slides: Unlabeled cell bounding boxes\n", + " - For some slides: Cell bounding boxes with cell type labels for each annotation session plus the finally obtained consensus.\n", + "\n", + "This notebook concentrates on **how to access and work with the annotation data**, that are made available in **DICOM Microscopy Bulk Simple Annotation format (ANNs)**. As a general introduction to this format, we recommend having a look at [this tutorial notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/pathomics/microscopy_dicom_ann_intro.ipynb).\n", + "\n", + "\n", + "\"Example\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w7A7-5CY8Mmu" + }, + "source": [ + "## Prerequisites\n", + "**Installations**\n", + "* **Install highdicom:** [highdicom](https://highdicom.readthedocs.io/en/latest/introduction.html) was specifically designed to work with DICOM objects holding image-derived information, e.g. annotations and measurements. Detailed information on highdicom's functionality can be found in its [user guide](https://highdicom.readthedocs.io/en/latest/usage.html).\n", + "* **Install wsidicom:** The [wsidicom](https://pypi.org/project/wsidicom/) Python package provides functionality to open and extract image or metadata from WSIs.\n", + "* **Install idc-index:** The Python package [idc-index](https://pypi.org/project/idc-index/) facilitates queries of the basic metadata and download of DICOM files hosted by the IDC." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "tacxH-AusHPT" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install highdicom\n", + "!pip install wsidicom\n", + "!pip install idc-index --upgrade" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZuAozr8WDtuM" + }, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dgtRNVatzl2s" + }, + "outputs": [], + "source": [ + "import os\n", + "import highdicom as hd\n", + "from idc_index import index\n", + "import pandas as pd\n", + "from google.cloud import storage\n", + "from pathlib import Path\n", + "from typing import List" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1W3-Mp_eDwF7" + }, + "source": [ + "## Finding the `BoneMarrowWSI-PediatricLeukemia` dataset on IDC\n", + "To access and download image and ANNs files, we utilize the Python package [idc-index](https://github.com/ImagingDataCommons/idc-index)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "D6XmpjYFsaJy" + }, + "outputs": [], + "source": [ + "idc_client = index.IDCClient() # set-up idc_client\n", + "idc_client.fetch_index('sm_instance_index')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0HEq03G4EFtv" + }, + "source": [ + "First, we verify that we have indeed 246 WSI (=distinct StudyInstanceUIDs) in the `BoneMarrowWSI-PediatricLeukemia` collection:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "kXr1V3q_s4x0", + "outputId": "d299017f-742b-4138-ace9-45def5f7f309" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " count(DISTINCT StudyInstanceUID)\n", + "0 246\n" + ] + } + ], + "source": [ + "query_slide_count = '''\n", + "SELECT COUNT(DISTINCT StudyInstanceUID)\n", + "FROM\n", + " index\n", + "WHERE\n", + " collection_id = 'bonemarrowwsi_pediatricleukemia' AND Modality='SM'\n", + "'''\n", + "print(idc_client.sql_query(query_slide_count))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1-ZR8MkhFqKX" + }, + "source": [ + "Next, let's have a look on the available annotation (ANN) files:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "_jSB4Bixi_fR", + "outputId": "a84b028b-d578-45cf-aae5-36e5299573c1" + }, + "outputs": [ + { + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "summary": "{\n \"name\": \"annotations\",\n \"rows\": 1033,\n \"fields\": [\n {\n \"column\": \"SeriesDescription\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 9,\n \"samples\": [\n \"Session 4: Cell bounding boxes with cell type labels\",\n \"Unlabeled cell bounding boxes\",\n \"Session 2: Cell bounding boxes with cell type labels\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"SeriesInstanceUID\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1033,\n \"samples\": [\n \"1.2.826.0.1.3680043.10.511.3.88235134132501978389580220127369133\",\n \"1.2.826.0.1.3680043.10.511.3.29207928547860358086730027085292511\",\n \"1.2.826.0.1.3680043.10.511.3.20191098733655337578747685878946077\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"StudyInstanceUID\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Modality\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", + "type": "dataframe", + "variable_name": "annotations" + }, + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SeriesDescriptionSeriesInstanceUIDStudyInstanceUIDModality
0Monolayer regions of interest for cell classif...1.2.826.0.1.3680043.10.511.3.76434139437749586...[1.2.826.0.1.3680043.8.498.1074763298775112063...[ANN]
1Monolayer regions of interest for cell classif...1.2.826.0.1.3680043.10.511.3.76035111849294113...[1.2.826.0.1.3680043.8.498.1110250475182573623...[ANN]
2Unlabeled cell bounding boxes1.2.826.0.1.3680043.10.511.3.51699668688633439...[1.2.826.0.1.3680043.8.498.1110250475182573623...[ANN]
3Consensus: cell bounding boxes with cell type ...1.2.826.0.1.3680043.10.511.3.18476424701131582...[1.2.826.0.1.3680043.8.498.1162778434880422268...[ANN]
4Monolayer regions of interest for cell classif...1.2.826.0.1.3680043.10.511.3.57387082213597634...[1.2.826.0.1.3680043.8.498.1162778434880422268...[ANN]
...............
1028Session 2: Cell bounding boxes with cell type ...1.2.826.0.1.3680043.10.511.3.39451636835490582...[1.2.826.0.1.3680043.8.498.9975397932428013130...[ANN]
1029Session 3: Cell bounding boxes with cell type ...1.2.826.0.1.3680043.10.511.3.67965106709643031...[1.2.826.0.1.3680043.8.498.9975397932428013130...[ANN]
1030Session 4: Cell bounding boxes with cell type ...1.2.826.0.1.3680043.10.511.3.98267820174458043...[1.2.826.0.1.3680043.8.498.9975397932428013130...[ANN]
1031Monolayer regions of interest for cell classif...1.2.826.0.1.3680043.10.511.3.86763164155160463...[1.2.826.0.1.3680043.8.498.9996452406228816651...[ANN]
1032Unlabeled cell bounding boxes1.2.826.0.1.3680043.10.511.3.70615242101987162...[1.2.826.0.1.3680043.8.498.9996452406228816651...[ANN]
\n", + "

1033 rows × 4 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " SeriesDescription \\\n", + "0 Monolayer regions of interest for cell classif... \n", + "1 Monolayer regions of interest for cell classif... \n", + "2 Unlabeled cell bounding boxes \n", + "3 Consensus: cell bounding boxes with cell type ... \n", + "4 Monolayer regions of interest for cell classif... \n", + "... ... \n", + "1028 Session 2: Cell bounding boxes with cell type ... \n", + "1029 Session 3: Cell bounding boxes with cell type ... \n", + "1030 Session 4: Cell bounding boxes with cell type ... \n", + "1031 Monolayer regions of interest for cell classif... \n", + "1032 Unlabeled cell bounding boxes \n", + "\n", + " SeriesInstanceUID \\\n", + "0 1.2.826.0.1.3680043.10.511.3.76434139437749586... \n", + "1 1.2.826.0.1.3680043.10.511.3.76035111849294113... \n", + "2 1.2.826.0.1.3680043.10.511.3.51699668688633439... \n", + "3 1.2.826.0.1.3680043.10.511.3.18476424701131582... \n", + "4 1.2.826.0.1.3680043.10.511.3.57387082213597634... \n", + "... ... \n", + "1028 1.2.826.0.1.3680043.10.511.3.39451636835490582... \n", + "1029 1.2.826.0.1.3680043.10.511.3.67965106709643031... \n", + "1030 1.2.826.0.1.3680043.10.511.3.98267820174458043... \n", + "1031 1.2.826.0.1.3680043.10.511.3.86763164155160463... \n", + "1032 1.2.826.0.1.3680043.10.511.3.70615242101987162... \n", + "\n", + " StudyInstanceUID Modality \n", + "0 [1.2.826.0.1.3680043.8.498.1074763298775112063... [ANN] \n", + "1 [1.2.826.0.1.3680043.8.498.1110250475182573623... [ANN] \n", + "2 [1.2.826.0.1.3680043.8.498.1110250475182573623... [ANN] \n", + "3 [1.2.826.0.1.3680043.8.498.1162778434880422268... [ANN] \n", + "4 [1.2.826.0.1.3680043.8.498.1162778434880422268... [ANN] \n", + "... ... ... \n", + "1028 [1.2.826.0.1.3680043.8.498.9975397932428013130... [ANN] \n", + "1029 [1.2.826.0.1.3680043.8.498.9975397932428013130... [ANN] \n", + "1030 [1.2.826.0.1.3680043.8.498.9975397932428013130... [ANN] \n", + "1031 [1.2.826.0.1.3680043.8.498.9996452406228816651... [ANN] \n", + "1032 [1.2.826.0.1.3680043.8.498.9996452406228816651... [ANN] \n", + "\n", + "[1033 rows x 4 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "query_anns = '''\n", + "SELECT\n", + " SeriesDescription,\n", + " SeriesInstanceUID,\n", + " ARRAY_AGG(StudyInstanceUID) AS StudyInstanceUID,\n", + " ARRAY_AGG(Modality) AS Modality\n", + "FROM\n", + " index\n", + "WHERE\n", + " collection_id = 'bonemarrowwsi_pediatricleukemia' AND Modality='ANN'\n", + "GROUP BY\n", + " SeriesInstanceUID,\n", + " SeriesDescription\n", + "ORDER BY\n", + " StudyInstanceUID,\n", + " SeriesDescription\n", + "'''\n", + "annotations = idc_client.sql_query(query_anns)\n", + "display(annotations)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kBWUmv6qHZ48" + }, + "source": [ + "We can see, that for each slide (i.e. DICOM Study) there are multiple ANN Series. Looking at the SeriesDescription, we can assert what is described in the [Background](#Background) section of this notebook.\n", + "\n", + "\n", + "* Each slide has \"Monolayer regions of interest for cell classification\" annotations.\n", + "* For some slides, there is one ANN Series with \"Unlabeled cell bounding boxes\", while for others, there are multiple ANN Series containing \"Cell bounding boxes with cell type labels\" for different annotation sessions and the consensus labels.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bd3j23XK7Ren" + }, + "source": [ + "## Viewing annotations\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a_-BRDWSF7A2" + }, + "source": [ + "Annotations can be viewed and explored in detail on its respective slide using the Slim viewer. In the Slim viewer's interface at the bottom of the right sidebar you may select the ANN Series of interest to you from the drop-down menue, then click on `Annotation Groups` and switch the slider(s) to make annotations visible." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 921 + }, + "id": "AhO13xdLw3p-", + "outputId": "2ba34c68-b1e4-4997-dad8-73f778058ce6" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "viewer_url = idc_client.get_viewer_URL(studyInstanceUID=annotations['StudyInstanceUID'].iloc[3][0], viewer_selector='slim')\n", + "from IPython.display import IFrame\n", + "IFrame(viewer_url, width=1500, height=900)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KLpFrRBj7f2v" + }, + "source": [ + "## Accessing annotations" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SY9wu5004-dt" + }, + "source": [ + "### Download complete annotation collection for local access\n", + "Since the annotation dataset is of reasonable size it could be downloaded completely using `idc_index` as shown below and then accessed from the local disk using `highdicom`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "sd8Z20qh5SFs", + "outputId": "f184bd83-2986-41c0-f6d9-b34778da7880" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Downloading data: 93%|█████████▎| 17.1M/18.4M [00:04<00:00, 3.69MB/s]\n" + ] + } + ], + "source": [ + "dcm_ann_dir = Path('/content/dicom_ann_annotations')\n", + "os.makedirs(dcm_ann_dir, exist_ok=True)\n", + "\n", + "idc_client.download_from_selection(downloadDir=dcm_ann_dir,\n", + " seriesInstanceUID=annotations['SeriesInstanceUID'].tolist(), dirTemplate=None)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bkGaAD9t6-Dg" + }, + "source": [ + "For guidance on how to read the downloaded annotation files see section \"Reading DICOM ANNs\" of [this tutorial notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/pathomics/microscopy_dicom_ann_intro.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WHvZRtVE5ByF" + }, + "source": [ + "### Access annotations directly from the Cloud\n", + "\n", + "A more desirable approach especially for larger size datasets is to directly extract the relevant information from the objects in the cloud. The following functions `get_roi_annotations()` and `get_cell_annotations()` can be used for this approach. They extract and summarize ROIs respectively cell annotations in an easy to use pandas DataFrame.\n", + "Note, that the selection of the respective annotation files, i.e. files containing ROI annotations, labeled or unlabeled cell annotations, is done by filtering for the respective SeriesDescription." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XKzx9B1fU9ed" + }, + "source": [ + "The following two code cells define and use `get_roi_annotations()` to select all DICOM ANNs in the `BoneMarrowWSI-PediatricLeukemia` collection that contain ROI annotations of the monolayer area.\n", + "The resulting pandas DataFrame contains\n", + "- **'SeriesInstanceUID'**: SeriesInstanceUID of the DICOM ANN Series containing the cell annotation.\n", + "- **'roi_id'**: the ID of the ROI\n", + "- **'roi_label'**: its label \n", + "- **'roi_coordinates'**: the 2D coordinates in the image coordinate system of the referenced slide level\n", + "- **'reference_SeriesInstanceUID'** and **'reference_SOPInstanceUID'**: the SeriesInstanceUID and SOPInstanceUID of the slide level the annotations refer to.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "id": "IpLUim520o5B" + }, + "outputs": [], + "source": [ + "def get_roi_annotations(demo: bool = False):\n", + " query_roi_anns = '''\n", + " SELECT\n", + " SeriesInstanceUID\n", + " FROM\n", + " index\n", + " WHERE\n", + " collection_id = 'bonemarrowwsi_pediatricleukemia'\n", + " AND Modality='ANN'\n", + " AND LOWER(SeriesDescription) LIKE '%monolayer%'\n", + " ORDER BY\n", + " StudyInstanceUID,\n", + " SeriesDescription\n", + " '''\n", + " roi_series = idc_client.sql_query(query_roi_anns)\n", + " if demo:\n", + " roi_series_to_extract = roi_series['SeriesInstanceUID'].tolist()[:10]\n", + " else:\n", + " roi_series_to_extract = roi_series['SeriesInstanceUID'].tolist()\n", + " rois = extract_rois(roi_series_to_extract)\n", + " return rois\n", + "\n", + "\n", + "def extract_rois(series_uids: List[str]) -> pd.DataFrame:\n", + " gcs_client = storage.Client.create_anonymous_client()\n", + " rows = []\n", + " for series_uid in series_uids:\n", + " file_urls = idc_client.get_series_file_URLs(seriesInstanceUID=series_uid, source_bucket_location='gcs')\n", + " for file_url in file_urls:\n", + " (_,_, bucket_name, folder_name, file_name) = file_url.split('/')\n", + " bucket = gcs_client.bucket(bucket_name)\n", + " blob = bucket.blob(f'{folder_name}/{file_name}')\n", + "\n", + " with blob.open('rb') as file_obj:\n", + " ann = hd.ann.annread(file_obj)\n", + " for ann_group in ann.get_annotation_groups():\n", + " coords = ann_group.get_graphic_data(coordinate_type='2D')\n", + " m_names, m_values, m_units = ann_group.get_measurements()\n", + " for c, m in zip(coords, m_values):\n", + " rows.append({\n", + " 'SeriesInstanceUID': ann.SeriesInstanceUID,\n", + " 'roi_id': int(m[0]), # allow empty roi_id,\n", + " 'roi_label': ann_group.label,\n", + " 'roi_coordinates': c,\n", + " 'reference_SeriesInstanceUID': ann.ReferencedSeriesSequence[0].SeriesInstanceUID,\n", + " 'reference_SOPInstanceUID': ann.ReferencedImageSequence[0].ReferencedSOPInstanceUID,\n", + " })\n", + " rois = pd.DataFrame(rows)\n", + " return rois" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "25MV1-Mb2CYt", + "outputId": "e7ce16a4-86dd-4228-bdd6-2fd029a15687" + }, + "outputs": [ + { + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "summary": "{\n \"name\": \"rois\",\n \"rows\": 37,\n \"fields\": [\n {\n \"column\": \"SeriesInstanceUID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"1.2.826.0.1.3680043.10.511.3.3275541283208652655099150083320563\",\n \"1.2.826.0.1.3680043.10.511.3.76035111849294113669615696032482122\",\n \"1.2.826.0.1.3680043.10.511.3.34033373592687248628388944428645243\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"roi_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 915,\n \"min\": 51,\n \"max\": 2323,\n \"num_unique_values\": 37,\n \"samples\": [\n 1173,\n 261,\n 290\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"roi_label\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"region_of_interest\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"roi_coordinates\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"reference_SeriesInstanceUID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"1.2.826.0.1.3680043.8.498.70799816019966502082886199431450776619\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"reference_SOPInstanceUID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"1.2.826.0.1.3680043.8.498.2872748076660048301793242630856089972\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", + "type": "dataframe", + "variable_name": "rois" + }, + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SeriesInstanceUIDroi_idroi_labelroi_coordinatesreference_SeriesInstanceUIDreference_SOPInstanceUID
01.2.826.0.1.3680043.10.511.3.76434139437749586...2271region_of_interest[[72032.0, 160247.0], [74080.0, 160247.0], [74...1.2.826.0.1.3680043.8.498.98377665788926698337...1.2.826.0.1.3680043.8.498.70616662305497812223...
11.2.826.0.1.3680043.10.511.3.76434139437749586...2272region_of_interest[[92857.0, 163260.0], [94905.0, 163260.0], [94...1.2.826.0.1.3680043.8.498.98377665788926698337...1.2.826.0.1.3680043.8.498.70616662305497812223...
21.2.826.0.1.3680043.10.511.3.76035111849294113...1070region_of_interest[[48995.0, 80570.0], [51043.0, 80570.0], [5104...1.2.826.0.1.3680043.8.498.99045734331130228562...1.2.826.0.1.3680043.8.498.52239720641745361153...
31.2.826.0.1.3680043.10.511.3.76035111849294113...1071region_of_interest[[99451.0, 126518.0], [101499.0, 126518.0], [1...1.2.826.0.1.3680043.8.498.99045734331130228562...1.2.826.0.1.3680043.8.498.52239720641745361153...
41.2.826.0.1.3680043.10.511.3.57387082213597634...290region_of_interest[[21001.0, 175192.0], [23049.0, 175192.0], [23...1.2.826.0.1.3680043.8.498.36810224044030831386...1.2.826.0.1.3680043.8.498.20301403784060697253...
51.2.826.0.1.3680043.10.511.3.57387082213597634...291region_of_interest[[85616.0, 154208.0], [87664.0, 154208.0], [87...1.2.826.0.1.3680043.8.498.36810224044030831386...1.2.826.0.1.3680043.8.498.20301403784060697253...
61.2.826.0.1.3680043.10.511.3.57387082213597634...2323region_of_interest[[27146.0, 122398.0], [47628.0, 122398.0], [47...1.2.826.0.1.3680043.8.498.36810224044030831386...1.2.826.0.1.3680043.8.498.20301403784060697253...
71.2.826.0.1.3680043.10.511.3.11224067190602751...322region_of_interest[[40807.0, 31720.0], [42855.0, 31720.0], [4285...1.2.826.0.1.3680043.8.498.82223767803353692585...1.2.826.0.1.3680043.8.498.72082594196695068782...
81.2.826.0.1.3680043.10.511.3.11224067190602751...323region_of_interest[[41251.0, 26908.0], [43299.0, 26908.0], [4329...1.2.826.0.1.3680043.8.498.82223767803353692585...1.2.826.0.1.3680043.8.498.72082594196695068782...
91.2.826.0.1.3680043.10.511.3.11224067190602751...2017region_of_interest[[25157.0, 41317.0], [30531.0, 41317.0], [3053...1.2.826.0.1.3680043.8.498.82223767803353692585...1.2.826.0.1.3680043.8.498.72082594196695068782...
101.2.826.0.1.3680043.10.511.3.11224067190602751...2018region_of_interest[[83712.0, 75137.0], [88720.0, 75137.0], [8872...1.2.826.0.1.3680043.8.498.82223767803353692585...1.2.826.0.1.3680043.8.498.72082594196695068782...
111.2.826.0.1.3680043.10.511.3.11224067190602751...2019region_of_interest[[70548.0, 93491.0], [73903.0, 93491.0], [7390...1.2.826.0.1.3680043.8.498.82223767803353692585...1.2.826.0.1.3680043.8.498.72082594196695068782...
121.2.826.0.1.3680043.10.511.3.11224067190602751...2020region_of_interest[[79658.0, 95782.0], [84910.0, 95782.0], [8491...1.2.826.0.1.3680043.8.498.82223767803353692585...1.2.826.0.1.3680043.8.498.72082594196695068782...
131.2.826.0.1.3680043.10.511.3.32342080546985181...261region_of_interest[[63457.0, 85889.0], [65505.0, 85889.0], [6550...1.2.826.0.1.3680043.8.498.28621295652678350702...1.2.826.0.1.3680043.8.498.35634115629470368866...
141.2.826.0.1.3680043.10.511.3.32342080546985181...262region_of_interest[[119087.0, 146686.0], [121135.0, 146686.0], [...1.2.826.0.1.3680043.8.498.28621295652678350702...1.2.826.0.1.3680043.8.498.35634115629470368866...
151.2.826.0.1.3680043.10.511.3.32342080546985181...2199region_of_interest[[51992.0, 60697.0], [97259.0, 60697.0], [9725...1.2.826.0.1.3680043.8.498.28621295652678350702...1.2.826.0.1.3680043.8.498.35634115629470368866...
161.2.826.0.1.3680043.10.511.3.34033373592687248...1172region_of_interest[[41898.0, 45076.0], [43946.0, 45076.0], [4394...1.2.826.0.1.3680043.8.498.87409625199169121538...1.2.826.0.1.3680043.8.498.35712192446749686729...
171.2.826.0.1.3680043.10.511.3.34033373592687248...1173region_of_interest[[104405.0, 43133.0], [106453.0, 43133.0], [10...1.2.826.0.1.3680043.8.498.87409625199169121538...1.2.826.0.1.3680043.8.498.35712192446749686729...
181.2.826.0.1.3680043.10.511.3.34033373592687248...2085region_of_interest[[51095.0, 46221.0], [57375.0, 46221.0], [5737...1.2.826.0.1.3680043.8.498.87409625199169121538...1.2.826.0.1.3680043.8.498.35712192446749686729...
191.2.826.0.1.3680043.10.511.3.34033373592687248...2086region_of_interest[[90154.0, 51503.0], [96260.0, 51503.0], [9626...1.2.826.0.1.3680043.8.498.87409625199169121538...1.2.826.0.1.3680043.8.498.35712192446749686729...
201.2.826.0.1.3680043.10.511.3.34033373592687248...2087region_of_interest[[54861.0, 63831.0], [59836.0, 63831.0], [5983...1.2.826.0.1.3680043.8.498.87409625199169121538...1.2.826.0.1.3680043.8.498.35712192446749686729...
211.2.826.0.1.3680043.10.511.3.34033373592687248...2088region_of_interest[[111385.0, 70592.0], [116651.0, 70592.0], [11...1.2.826.0.1.3680043.8.498.87409625199169121538...1.2.826.0.1.3680043.8.498.35712192446749686729...
221.2.826.0.1.3680043.10.511.3.97795071042004815...51region_of_interest[[126985.0, 267335.0], [129033.0, 267335.0], [...1.2.826.0.1.3680043.8.498.99096006395418595522...1.2.826.0.1.3680043.8.498.94915091369473739710...
231.2.826.0.1.3680043.10.511.3.97795071042004815...52region_of_interest[[72610.0, 264357.0], [74658.0, 264357.0], [74...1.2.826.0.1.3680043.8.498.99096006395418595522...1.2.826.0.1.3680043.8.498.94915091369473739710...
241.2.826.0.1.3680043.10.511.3.97795071042004815...2106region_of_interest[[117409.0, 274298.0], [146737.0, 274298.0], [...1.2.826.0.1.3680043.8.498.99096006395418595522...1.2.826.0.1.3680043.8.498.94915091369473739710...
251.2.826.0.1.3680043.10.511.3.57010110707541598...366region_of_interest[[93444.0, 82991.0], [95492.0, 82991.0], [9549...1.2.826.0.1.3680043.8.498.63551739745624702403...1.2.826.0.1.3680043.8.498.25876685401930579513...
261.2.826.0.1.3680043.10.511.3.57010110707541598...367region_of_interest[[92590.0, 112474.0], [94638.0, 112474.0], [94...1.2.826.0.1.3680043.8.498.63551739745624702403...1.2.826.0.1.3680043.8.498.25876685401930579513...
271.2.826.0.1.3680043.10.511.3.57010110707541598...2290region_of_interest[[45570.0, 105204.0], [47298.0, 105204.0], [47...1.2.826.0.1.3680043.8.498.63551739745624702403...1.2.826.0.1.3680043.8.498.25876685401930579513...
281.2.826.0.1.3680043.10.511.3.57010110707541598...2291region_of_interest[[36609.0, 126384.0], [38646.0, 126384.0], [38...1.2.826.0.1.3680043.8.498.63551739745624702403...1.2.826.0.1.3680043.8.498.25876685401930579513...
291.2.826.0.1.3680043.10.511.3.57010110707541598...2292region_of_interest[[135629.0, 174205.0], [142382.0, 174205.0], [...1.2.826.0.1.3680043.8.498.63551739745624702403...1.2.826.0.1.3680043.8.498.25876685401930579513...
301.2.826.0.1.3680043.10.511.3.57010110707541598...2293region_of_interest[[133120.0, 215687.0], [143659.0, 215687.0], [...1.2.826.0.1.3680043.8.498.63551739745624702403...1.2.826.0.1.3680043.8.498.25876685401930579513...
311.2.826.0.1.3680043.10.511.3.32755412832086526...184region_of_interest[[66993.0, 99737.0], [69041.0, 99737.0], [6904...1.2.826.0.1.3680043.8.498.70799816019966502082...1.2.826.0.1.3680043.8.498.28727480766600483017...
321.2.826.0.1.3680043.10.511.3.32755412832086526...185region_of_interest[[120896.0, 139706.0], [122944.0, 139706.0], [...1.2.826.0.1.3680043.8.498.70799816019966502082...1.2.826.0.1.3680043.8.498.28727480766600483017...
331.2.826.0.1.3680043.10.511.3.42344126050021596...240region_of_interest[[99586.0, 194216.0], [101699.0, 194216.0], [1...1.2.826.0.1.3680043.8.498.81530114654037744426...1.2.826.0.1.3680043.8.498.11930057970062427065...
341.2.826.0.1.3680043.10.511.3.42344126050021596...241region_of_interest[[33185.0, 172513.0], [35233.0, 172513.0], [35...1.2.826.0.1.3680043.8.498.81530114654037744426...1.2.826.0.1.3680043.8.498.11930057970062427065...
351.2.826.0.1.3680043.10.511.3.42344126050021596...2154region_of_interest[[53973.0, 164172.0], [58823.0, 164172.0], [58...1.2.826.0.1.3680043.8.498.81530114654037744426...1.2.826.0.1.3680043.8.498.11930057970062427065...
361.2.826.0.1.3680043.10.511.3.42344126050021596...2155region_of_interest[[136416.0, 206608.0], [139550.0, 206608.0], [...1.2.826.0.1.3680043.8.498.81530114654037744426...1.2.826.0.1.3680043.8.498.11930057970062427065...
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " SeriesInstanceUID roi_id \\\n", + "0 1.2.826.0.1.3680043.10.511.3.76434139437749586... 2271 \n", + "1 1.2.826.0.1.3680043.10.511.3.76434139437749586... 2272 \n", + "2 1.2.826.0.1.3680043.10.511.3.76035111849294113... 1070 \n", + "3 1.2.826.0.1.3680043.10.511.3.76035111849294113... 1071 \n", + "4 1.2.826.0.1.3680043.10.511.3.57387082213597634... 290 \n", + "5 1.2.826.0.1.3680043.10.511.3.57387082213597634... 291 \n", + "6 1.2.826.0.1.3680043.10.511.3.57387082213597634... 2323 \n", + "7 1.2.826.0.1.3680043.10.511.3.11224067190602751... 322 \n", + "8 1.2.826.0.1.3680043.10.511.3.11224067190602751... 323 \n", + "9 1.2.826.0.1.3680043.10.511.3.11224067190602751... 2017 \n", + "10 1.2.826.0.1.3680043.10.511.3.11224067190602751... 2018 \n", + "11 1.2.826.0.1.3680043.10.511.3.11224067190602751... 2019 \n", + "12 1.2.826.0.1.3680043.10.511.3.11224067190602751... 2020 \n", + "13 1.2.826.0.1.3680043.10.511.3.32342080546985181... 261 \n", + "14 1.2.826.0.1.3680043.10.511.3.32342080546985181... 262 \n", + "15 1.2.826.0.1.3680043.10.511.3.32342080546985181... 2199 \n", + "16 1.2.826.0.1.3680043.10.511.3.34033373592687248... 1172 \n", + "17 1.2.826.0.1.3680043.10.511.3.34033373592687248... 1173 \n", + "18 1.2.826.0.1.3680043.10.511.3.34033373592687248... 2085 \n", + "19 1.2.826.0.1.3680043.10.511.3.34033373592687248... 2086 \n", + "20 1.2.826.0.1.3680043.10.511.3.34033373592687248... 2087 \n", + "21 1.2.826.0.1.3680043.10.511.3.34033373592687248... 2088 \n", + "22 1.2.826.0.1.3680043.10.511.3.97795071042004815... 51 \n", + "23 1.2.826.0.1.3680043.10.511.3.97795071042004815... 52 \n", + "24 1.2.826.0.1.3680043.10.511.3.97795071042004815... 2106 \n", + "25 1.2.826.0.1.3680043.10.511.3.57010110707541598... 366 \n", + "26 1.2.826.0.1.3680043.10.511.3.57010110707541598... 367 \n", + "27 1.2.826.0.1.3680043.10.511.3.57010110707541598... 2290 \n", + "28 1.2.826.0.1.3680043.10.511.3.57010110707541598... 2291 \n", + "29 1.2.826.0.1.3680043.10.511.3.57010110707541598... 2292 \n", + "30 1.2.826.0.1.3680043.10.511.3.57010110707541598... 2293 \n", + "31 1.2.826.0.1.3680043.10.511.3.32755412832086526... 184 \n", + "32 1.2.826.0.1.3680043.10.511.3.32755412832086526... 185 \n", + "33 1.2.826.0.1.3680043.10.511.3.42344126050021596... 240 \n", + "34 1.2.826.0.1.3680043.10.511.3.42344126050021596... 241 \n", + "35 1.2.826.0.1.3680043.10.511.3.42344126050021596... 2154 \n", + "36 1.2.826.0.1.3680043.10.511.3.42344126050021596... 2155 \n", + "\n", + " roi_label roi_coordinates \\\n", + "0 region_of_interest [[72032.0, 160247.0], [74080.0, 160247.0], [74... \n", + "1 region_of_interest [[92857.0, 163260.0], [94905.0, 163260.0], [94... \n", + "2 region_of_interest [[48995.0, 80570.0], [51043.0, 80570.0], [5104... \n", + "3 region_of_interest [[99451.0, 126518.0], [101499.0, 126518.0], [1... \n", + "4 region_of_interest [[21001.0, 175192.0], [23049.0, 175192.0], [23... \n", + "5 region_of_interest [[85616.0, 154208.0], [87664.0, 154208.0], [87... \n", + "6 region_of_interest [[27146.0, 122398.0], [47628.0, 122398.0], [47... \n", + "7 region_of_interest [[40807.0, 31720.0], [42855.0, 31720.0], [4285... \n", + "8 region_of_interest [[41251.0, 26908.0], [43299.0, 26908.0], [4329... \n", + "9 region_of_interest [[25157.0, 41317.0], [30531.0, 41317.0], [3053... \n", + "10 region_of_interest [[83712.0, 75137.0], [88720.0, 75137.0], [8872... \n", + "11 region_of_interest [[70548.0, 93491.0], [73903.0, 93491.0], [7390... \n", + "12 region_of_interest [[79658.0, 95782.0], [84910.0, 95782.0], [8491... \n", + "13 region_of_interest [[63457.0, 85889.0], [65505.0, 85889.0], [6550... \n", + "14 region_of_interest [[119087.0, 146686.0], [121135.0, 146686.0], [... \n", + "15 region_of_interest [[51992.0, 60697.0], [97259.0, 60697.0], [9725... \n", + "16 region_of_interest [[41898.0, 45076.0], [43946.0, 45076.0], [4394... \n", + "17 region_of_interest [[104405.0, 43133.0], [106453.0, 43133.0], [10... \n", + "18 region_of_interest [[51095.0, 46221.0], [57375.0, 46221.0], [5737... \n", + "19 region_of_interest [[90154.0, 51503.0], [96260.0, 51503.0], [9626... \n", + "20 region_of_interest [[54861.0, 63831.0], [59836.0, 63831.0], [5983... \n", + "21 region_of_interest [[111385.0, 70592.0], [116651.0, 70592.0], [11... \n", + "22 region_of_interest [[126985.0, 267335.0], [129033.0, 267335.0], [... \n", + "23 region_of_interest [[72610.0, 264357.0], [74658.0, 264357.0], [74... \n", + "24 region_of_interest [[117409.0, 274298.0], [146737.0, 274298.0], [... \n", + "25 region_of_interest [[93444.0, 82991.0], [95492.0, 82991.0], [9549... \n", + "26 region_of_interest [[92590.0, 112474.0], [94638.0, 112474.0], [94... \n", + "27 region_of_interest [[45570.0, 105204.0], [47298.0, 105204.0], [47... \n", + "28 region_of_interest [[36609.0, 126384.0], [38646.0, 126384.0], [38... \n", + "29 region_of_interest [[135629.0, 174205.0], [142382.0, 174205.0], [... \n", + "30 region_of_interest [[133120.0, 215687.0], [143659.0, 215687.0], [... \n", + "31 region_of_interest [[66993.0, 99737.0], [69041.0, 99737.0], [6904... \n", + "32 region_of_interest [[120896.0, 139706.0], [122944.0, 139706.0], [... \n", + "33 region_of_interest [[99586.0, 194216.0], [101699.0, 194216.0], [1... \n", + "34 region_of_interest [[33185.0, 172513.0], [35233.0, 172513.0], [35... \n", + "35 region_of_interest [[53973.0, 164172.0], [58823.0, 164172.0], [58... \n", + "36 region_of_interest [[136416.0, 206608.0], [139550.0, 206608.0], [... \n", + "\n", + " reference_SeriesInstanceUID \\\n", + "0 1.2.826.0.1.3680043.8.498.98377665788926698337... \n", + "1 1.2.826.0.1.3680043.8.498.98377665788926698337... \n", + "2 1.2.826.0.1.3680043.8.498.99045734331130228562... \n", + "3 1.2.826.0.1.3680043.8.498.99045734331130228562... \n", + "4 1.2.826.0.1.3680043.8.498.36810224044030831386... \n", + "5 1.2.826.0.1.3680043.8.498.36810224044030831386... \n", + "6 1.2.826.0.1.3680043.8.498.36810224044030831386... \n", + "7 1.2.826.0.1.3680043.8.498.82223767803353692585... \n", + "8 1.2.826.0.1.3680043.8.498.82223767803353692585... \n", + "9 1.2.826.0.1.3680043.8.498.82223767803353692585... \n", + "10 1.2.826.0.1.3680043.8.498.82223767803353692585... \n", + "11 1.2.826.0.1.3680043.8.498.82223767803353692585... \n", + "12 1.2.826.0.1.3680043.8.498.82223767803353692585... \n", + "13 1.2.826.0.1.3680043.8.498.28621295652678350702... \n", + "14 1.2.826.0.1.3680043.8.498.28621295652678350702... \n", + "15 1.2.826.0.1.3680043.8.498.28621295652678350702... \n", + "16 1.2.826.0.1.3680043.8.498.87409625199169121538... \n", + "17 1.2.826.0.1.3680043.8.498.87409625199169121538... \n", + "18 1.2.826.0.1.3680043.8.498.87409625199169121538... \n", + "19 1.2.826.0.1.3680043.8.498.87409625199169121538... \n", + "20 1.2.826.0.1.3680043.8.498.87409625199169121538... \n", + "21 1.2.826.0.1.3680043.8.498.87409625199169121538... \n", + "22 1.2.826.0.1.3680043.8.498.99096006395418595522... \n", + "23 1.2.826.0.1.3680043.8.498.99096006395418595522... \n", + "24 1.2.826.0.1.3680043.8.498.99096006395418595522... \n", + "25 1.2.826.0.1.3680043.8.498.63551739745624702403... \n", + "26 1.2.826.0.1.3680043.8.498.63551739745624702403... \n", + "27 1.2.826.0.1.3680043.8.498.63551739745624702403... \n", + "28 1.2.826.0.1.3680043.8.498.63551739745624702403... \n", + "29 1.2.826.0.1.3680043.8.498.63551739745624702403... \n", + "30 1.2.826.0.1.3680043.8.498.63551739745624702403... \n", + "31 1.2.826.0.1.3680043.8.498.70799816019966502082... \n", + "32 1.2.826.0.1.3680043.8.498.70799816019966502082... \n", + "33 1.2.826.0.1.3680043.8.498.81530114654037744426... \n", + "34 1.2.826.0.1.3680043.8.498.81530114654037744426... \n", + "35 1.2.826.0.1.3680043.8.498.81530114654037744426... \n", + "36 1.2.826.0.1.3680043.8.498.81530114654037744426... \n", + "\n", + " reference_SOPInstanceUID \n", + "0 1.2.826.0.1.3680043.8.498.70616662305497812223... \n", + "1 1.2.826.0.1.3680043.8.498.70616662305497812223... \n", + "2 1.2.826.0.1.3680043.8.498.52239720641745361153... \n", + "3 1.2.826.0.1.3680043.8.498.52239720641745361153... \n", + "4 1.2.826.0.1.3680043.8.498.20301403784060697253... \n", + "5 1.2.826.0.1.3680043.8.498.20301403784060697253... \n", + "6 1.2.826.0.1.3680043.8.498.20301403784060697253... \n", + "7 1.2.826.0.1.3680043.8.498.72082594196695068782... \n", + "8 1.2.826.0.1.3680043.8.498.72082594196695068782... \n", + "9 1.2.826.0.1.3680043.8.498.72082594196695068782... \n", + "10 1.2.826.0.1.3680043.8.498.72082594196695068782... \n", + "11 1.2.826.0.1.3680043.8.498.72082594196695068782... \n", + "12 1.2.826.0.1.3680043.8.498.72082594196695068782... \n", + "13 1.2.826.0.1.3680043.8.498.35634115629470368866... \n", + "14 1.2.826.0.1.3680043.8.498.35634115629470368866... \n", + "15 1.2.826.0.1.3680043.8.498.35634115629470368866... \n", + "16 1.2.826.0.1.3680043.8.498.35712192446749686729... \n", + "17 1.2.826.0.1.3680043.8.498.35712192446749686729... \n", + "18 1.2.826.0.1.3680043.8.498.35712192446749686729... \n", + "19 1.2.826.0.1.3680043.8.498.35712192446749686729... \n", + "20 1.2.826.0.1.3680043.8.498.35712192446749686729... \n", + "21 1.2.826.0.1.3680043.8.498.35712192446749686729... \n", + "22 1.2.826.0.1.3680043.8.498.94915091369473739710... \n", + "23 1.2.826.0.1.3680043.8.498.94915091369473739710... \n", + "24 1.2.826.0.1.3680043.8.498.94915091369473739710... \n", + "25 1.2.826.0.1.3680043.8.498.25876685401930579513... \n", + "26 1.2.826.0.1.3680043.8.498.25876685401930579513... \n", + "27 1.2.826.0.1.3680043.8.498.25876685401930579513... \n", + "28 1.2.826.0.1.3680043.8.498.25876685401930579513... \n", + "29 1.2.826.0.1.3680043.8.498.25876685401930579513... \n", + "30 1.2.826.0.1.3680043.8.498.25876685401930579513... \n", + "31 1.2.826.0.1.3680043.8.498.28727480766600483017... \n", + "32 1.2.826.0.1.3680043.8.498.28727480766600483017... \n", + "33 1.2.826.0.1.3680043.8.498.11930057970062427065... \n", + "34 1.2.826.0.1.3680043.8.498.11930057970062427065... \n", + "35 1.2.826.0.1.3680043.8.498.11930057970062427065... \n", + "36 1.2.826.0.1.3680043.8.498.11930057970062427065... " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# This code may run for 1-2 minutes if you remove the 'demo' mode, please be patient :)\n", + "rois = get_roi_annotations(demo=True)\n", + "display(rois)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gpDVBwjYWz-w" + }, + "source": [ + "The following code cells define and use `get_cell_annotations()` to select all DICOM ANNs in the `BoneMarrowWSI-PediatricLeukemia` collection that contain cell annotations. By setting the parameter 'subset' to either 'labeled', 'unlabeled' or 'both', it's possible to extract either only labeled, unlabeled or all cell annotations.\n", + "The resulting pandas DataFrame contains\n", + "- **'SeriesInstanceUID'**: SeriesInstanceUID of the DICOM ANN Series containing the cell annotation.\n", + "- **'annotation_session'**: 'n/a' for the unlabeled cells, otherwise the number of the annotation session or 'consensus' for the final consensus.\n", + "- **'cell_id'**: the ID of the cell\n", + "- **'roi_id'**: if applicable, the ID of the monolayer ROI, the cell is located within\n", + "- **'cell_label_code_scheme'**: Tuple of code of the cell label and designator of the coding scheme, e.g. (414387006, SCT) which is code 414387006 from SNOMED CT ontology\n", + "- **'cell_label'**: Code meaning of the cell label e.g. 'Structure of haematological system'\n", + "- **'cell_coordinates'**: the 2D coordinates in the image coordinate system of the referenced slide level\n", + "- **'reference_SeriesInstanceUID'** and **'reference_SOPInstanceUID'**: the SeriesInstanceUID and SOPInstanceUID of the slide level the annotations refer to." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "id": "68Ljng4vEjGl" + }, + "outputs": [], + "source": [ + "def get_cell_annotations(subset: str = 'labeled', demo: bool = False) -> pd.DataFrame:\n", + " assert subset in ['labeled', 'unlabeled', 'both']\n", + " if subset == 'labeled':\n", + " query_word = 'labels'\n", + " elif subset == 'unlabeled':\n", + " query_word = 'unlabeled'\n", + " else:\n", + " query_word = 'cell'\n", + "\n", + " query_cell_anns = f'''\n", + " SELECT\n", + " SeriesInstanceUID,\n", + " FROM\n", + " index\n", + " WHERE\n", + " collection_id = 'bonemarrowwsi_pediatricleukemia'\n", + " AND Modality='ANN'\n", + " AND LOWER(SeriesDescription) LIKE '%{query_word}%'\n", + " ORDER BY\n", + " StudyInstanceUID,\n", + " SeriesDescription\n", + " '''\n", + " cell_series = idc_client.sql_query(query_cell_anns)\n", + " if demo:\n", + " cell_series_to_extract = cell_series['SeriesInstanceUID'].tolist()[:10]\n", + " else:\n", + " cell_series_to_extract = cell_series['SeriesInstanceUID'].tolist()\n", + "\n", + " cells = extract_cells(cell_series_to_extract)\n", + " return cells\n", + "\n", + "\n", + "def extract_cells(series_uids: List[str]) -> pd.DataFrame:\n", + " gcs_client = storage.Client.create_anonymous_client()\n", + " rows = []\n", + " for series_uid in series_uids:\n", + " file_urls = idc_client.get_series_file_URLs(seriesInstanceUID=series_uid, source_bucket_location='gcs')\n", + " for file_url in file_urls:\n", + " (_,_, bucket_name, folder_name, file_name) = file_url.split('/')\n", + " bucket = gcs_client.bucket(bucket_name)\n", + " blob = bucket.blob(f'{folder_name}/{file_name}')\n", + "\n", + " with blob.open('rb') as file_obj:\n", + " ann = hd.ann.annread(file_obj)\n", + " for ann_group in ann.get_annotation_groups():\n", + " coords = ann_group.get_graphic_data(coordinate_type='2D')\n", + " m_names, m_values, m_units = ann_group.get_measurements()\n", + " for c, m in zip(coords, m_values):\n", + " rows.append({\n", + " 'SeriesInstanceUID': ann.SeriesInstanceUID,\n", + " 'annotation_session': get_annotation_session(ann),\n", + " 'cell_id': int(m[0]),\n", + " 'roi_id': int(m[1]) if m.size > 1 else None, # allow empty roi_id,\n", + " 'cell_label': ann_group.annotated_property_type.meaning,\n", + " 'cell_label_code_scheme': (ann_group.annotated_property_type.value, ann_group.annotated_property_type.scheme_designator),\n", + " 'cell_coordinates': c,\n", + " 'reference_SeriesInstanceUID': ann.ReferencedSeriesSequence[0].SeriesInstanceUID,\n", + " 'reference_SOPInstanceUID': ann.ReferencedImageSequence[0].ReferencedSOPInstanceUID\n", + " })\n", + " cells = pd.DataFrame(rows)\n", + " return cells\n", + "\n", + "\n", + "def get_annotation_session(ann: hd.ann.sop.MicroscopyBulkSimpleAnnotations) -> str:\n", + " if 'unlabeled' in ann.SeriesDescription.lower():\n", + " return 'n/a'\n", + " return ann.SeriesDescription.split(':')[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 878 + }, + "id": "rWNAouye6DbO", + "outputId": "077cf7de-c341-44d9-b157-3a6148e315d1" + }, + "outputs": [], + "source": [ + "# This code may run for 1-2 minutes if you remove the demo mode, please be patient :)\n", + "unlabeled_cells = get_cell_annotations(subset='unlabeled', demo=True)\n", + "display(unlabeled_cells)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "sK7jUhYw6DdZ", + "outputId": "9fafb57a-e83f-481f-b445-b0b461a84197" + }, + "outputs": [], + "source": [ + "# This code may run for 2-3 minutes, if you remove the demo mode please be patient :)\n", + "labeled_cells = get_cell_annotations(subset='labeled', demo=True)\n", + "sorted_cell_labels = labeled_cells.sort_values(by=['reference_SOPInstanceUID', 'cell_id', 'annotation_session'])\n", + "display(sorted_cell_labels.style.hide(axis='index')) # don't show row index" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eKSnU4dyXkiR" + }, + "source": [ + "## How to use the `BoneMarrowWSI-PediatricLeukemia` annotations\n", + "The `BoneMarrowWSI-PediatricLeukemia` collection stands out due to the extensive amount of information contained in its annotations. More than 40000 cells are annotated with bounding boxes suitable for training **cell detection models**, 28000 of those additionally received expert-generated class labels for **cell type classification** tasks. Particularly noteworthy is the uncertainty information embedded in the consensus labelling process, giving insight into which cell types are particularly challenging to determine or easy to confuse with others. \n", + "In the cell below, we catch some of those cases: " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ">[!CAUTION] In the current release, labels from the annotation sessions are flawed, please refrain from using them. However, consensus labels/classes are correct! " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 112 + }, + "id": "v2qI9zT4t76Z", + "outputId": "db6b43b6-0988-45b3-b1b9-bf683ff1c23e" + }, + "outputs": [ + { + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "summary": "{\n \"name\": \"test\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"cell_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 47204,\n \"max\": 47204,\n \"num_unique_values\": 1,\n \"samples\": [\n 47204\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"cell_label\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"cell_label_code_scheme\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"reference_SOPInstanceUID\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"1.2.826.0.1.3680043.8.498.7208259419669506878249006382744820449\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"cell_coordinates\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", + "type": "dataframe", + "variable_name": "test" + }, + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
cell_labelcell_label_code_schemereference_SOPInstanceUIDcell_coordinates
cell_id
47204[Structure of haematological system, Unusable ...[(414387006, SCT), (111235, DCM), (414387006, ...1.2.826.0.1.3680043.8.498.72082594196695068782...[[42780.0, 32381.0], [42827.0, 32381.0], [4282...
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " cell_label \\\n", + "cell_id \n", + "47204 [Structure of haematological system, Unusable ... \n", + "\n", + " cell_label_code_scheme \\\n", + "cell_id \n", + "47204 [(414387006, SCT), (111235, DCM), (414387006, ... \n", + "\n", + " reference_SOPInstanceUID \\\n", + "cell_id \n", + "47204 1.2.826.0.1.3680043.8.498.72082594196695068782... \n", + "\n", + " cell_coordinates \n", + "cell_id \n", + "47204 [[42780.0, 32381.0], [42827.0, 32381.0], [4282... " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "grouped_cell_labels = sorted_cell_labels.groupby('cell_id').agg({'cell_label': list, 'cell_label_code_scheme': list,\n", + " 'reference_SOPInstanceUID': 'first',\n", + " 'cell_coordinates': 'first'})\n", + "uncertain = grouped_cell_labels['cell_label'].apply(lambda x: len(set(x)) > 1)\n", + "display(grouped_cell_labels[uncertain])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ytLhTIcF23lu" + }, + "source": [ + "# Next steps\n", + "\n", + "Share your feedback or ask questions about this notebook in IDC Forum: https://discourse.canceridc.dev.\n", + "\n", + "If you are interested in tissue type annotations or want to learn about DICOM Structured Reporting, you can take a look at [this notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/collections_demos/rms_mutation_prediction/RMS-Mutation-Prediction-Expert-Annotations_exploration.ipynb) navigating expert-generated region annotations for rhabdomyosarcoma tumor slides." + ] + } + ], + "metadata": { + "colab": { + "authorship_tag": "ABX9TyOcM9RR08MUMnxFM2ViCHos", + "include_colab_link": true, + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}