Merge pull request #269 from ImagingDataCommons/v0.22.0dev

Version 0.22.0
ImagingDataCommons · Nov 9, 2023 · aad155c · aad155c
2 parents 3bf759d + edf6198
commit aad155c
Show file tree

Hide file tree

Showing 21 changed files with 4,647 additions and 927 deletions.
diff --git a/data/test_files/seg_image_sm_control.dcm b/data/test_files/seg_image_sm_control.dcm
diff --git a/data/test_files/seg_image_sm_dots_tiled_full.dcm b/data/test_files/seg_image_sm_dots_tiled_full.dcm
diff --git a/docs/ann.rst b/docs/ann.rst
@@ -184,8 +184,20 @@ transmitted over network, etc.
 Reading Existing Bulk Annotation Objects
 ----------------------------------------
 
-You can read an existing bulk annotation object using `pydicom` and then convert
-to the `highdicom` object like this:
+You can read an existing bulk annotation object from file using the
+:func:`highdicom.ann.annread()` function:
+
+.. code-block:: python
+
+    from pydicom import dcmread
+    import highdicom as hd
+
+    ann = hd.ann.annread('data/test_files/sm_annotations.dcm')
+
+    assert isinstance(ann, hd.ann.MicroscopyBulkSimpleAnnotations)
+
+Alternatively you can converting an existing ``pydicom.Dataset`` representing a
+bulk annotation object to the `highdicom` object like this:
 
 .. code-block:: python
 
@@ -198,7 +210,7 @@ to the `highdicom` object like this:
 
     assert isinstance(ann, hd.ann.MicroscopyBulkSimpleAnnotations)
 
-Note that this example (and the following examples) uses an example file that
+Note that these examples (and the following examples) uses an example file that
 you can access from the test data in the `highdicom` repository. It was created
 using exactly the code in the construction example above.
 
@@ -298,7 +310,7 @@ passed the data in to create the annotation with `highdicom`.
     import numpy as np
 
     graphic_data = group.get_graphic_data(
-        coordinate_type=ann.AnnotationCoordinateType,
+        coordinate_type=ann.annotation_coordinate_type,
     )
     assert len(graphic_data) == 2 and isinstance(graphic_data[0], np.ndarray)
 

diff --git a/docs/seg.rst b/docs/seg.rst
@@ -486,6 +486,240 @@ segments.
         device_serial_number='1234567890',
     )
 
+Constructing SEG Images from a Total Pixel Matrix
+-------------------------------------------------
+
+Some digital pathology images are represented as "tiled" images,
+in which the full image (known as the "total pixel matrix") is divided up
+into smaller rectangular regions in the row and column dimensions and each
+region ("tile") is stored as a frame in a multiframe DICOM image.
+
+Segmentations of such images are stored as a tiled image in the same manner.
+There are a two options in `highdicom` for doing this. You can either pass each
+tile/frame individually stacked as a 1D list down the first dimension of the
+``pixel_array`` as we have already seen (with the location of each frame either
+matching that of the corresponding frame in the source image or explicitly
+specified in the ``plane_positions`` argument), or you can pass the 2D total
+pixel matrix of the segmentation and have `highdicom` automatically create the
+tiles for you.
+
+To enable this latter option, pass the ``pixel_array`` as a single frame (i.e.
+a 2D labelmap array, a 3D labelmap array with a single frame stacked down the
+first axis, or a 4D array with a single frame stacked down the first dimension
+and any number of segments stacked down the last dimension) and set the
+``tile_pixel_array`` argument to ``True``. You can optionally choose the size
+(in pixels) of each tile using the ``tile_size`` argument, or, by default, the
+tile size of the source image will be used (regardless of whether the
+segmentation is represented at the same resolution as the source image).
+
+If you need to specify the plane positions of the image explicitly, you should
+pass a single item to the ``plane_positions`` argument giving the location of
+the top left corner of the full total pixel matrix. Otherwise, all the usual
+options are available to you.
+
+.. code-block:: python
+
+    # Use an example slide microscopy image from the highdicom test data
+    # directory
+    sm_image = dcmread('data/test_files/sm_image.dcm')
+
+    # The source image has multiple frames/tiles, but here we create a mask
+    # corresponding to the entire total pixel matrix
+    mask = np.zeros(
+        (
+            sm_image.TotalPixelMatrixRows,
+            sm_image.TotalPixelMatrixColumns
+        ),
+        dtype=np.uint8,
+    )
+    mask[38:43, 5:41] = 1
+
+    property_category = hd.sr.CodedConcept("91723000", "SCT", "Anatomical Structure")
+    property_type = hd.sr.CodedConcept("84640000", "SCT", "Nucleus")
+    segment_descriptions = [
+        hd.seg.SegmentDescription(
+            segment_number=1,
+            segment_label='Segment #1',
+            segmented_property_category=property_category,
+            segmented_property_type=property_type,
+            algorithm_type=hd.seg.SegmentAlgorithmTypeValues.MANUAL,
+        ),
+    ]
+
+    seg = hd.seg.Segmentation(
+        source_images=[sm_image],
+        pixel_array=mask,
+        segmentation_type=hd.seg.SegmentationTypeValues.BINARY,
+        segment_descriptions=segment_descriptions,
+        series_instance_uid=hd.UID(),
+        series_number=1,
+        sop_instance_uid=hd.UID(),
+        instance_number=1,
+        manufacturer='Foo Corp.',
+        manufacturer_model_name='Slide Segmentation Algorithm',
+        software_versions='0.0.1',
+        device_serial_number='1234567890',
+        tile_pixel_array=True,
+    )
+
+    # The result stores the mask as a set of 10 tiles of the non-empty region of
+    # the total pixel matrix, each of size (10, 10), matching # the tile size of
+    # the source image
+    assert seg.NumberOfFrames == 10
+    assert seg.pixel_array.shape == (10, 10, 10)
+
+``"TILED_FULL"`` and ``"TILED_SPARSE"``
+---------------------------------------
+
+When the segmentation is stored as a tiled image, there are two ways in which
+the locations of each frame/tile may be specified in the resulting object.
+These are defined by the value of the
+`"DimensionOrganizationType"
+<https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.17.html#table_C.7.6.17-1>`_
+attribute:
+
+- ``"TILED_SPARSE"``: The position of each tile is explicitly defined in the
+  `"PerFrameFunctionalGroupsSequence"
+  <https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.16.html#table_C.7.6.16-1>`_
+  of the object. This requires a potentially very long sequence to store all
+  the per-frame metadata, but does allow for the omission of empty frames from
+  the segmentation and other irregular tiling strategies.
+- ``"TILED_FULL"``: The position of each tile is implicitly defined using a
+  predetermined order of the frames. This saves the need to store the pre-frame
+  metadata but does not allow for the omission of empty frames of the
+  segmentation and is generally less flexible. It may also be simpler for a
+  receiving application to process, since the tiles are guaranteed to be
+  regularly and consistently ordered.
+
+You can control this behavior by specifying the
+``dimension_organization_type`` parameter and passing a value of the
+:class:`highdicom.DimensionOrganizationTypeValues` enum. The default value is
+``"TILED_SPARSE"``. Generally, the ``"TILED_FULL"`` option will be used in
+combination with ``tile_pixel_array`` argument.
+
+
+.. code-block:: python
+
+    # Using the same example as above, this time as TILED_FULL
+    seg = hd.seg.Segmentation(
+        source_images=[sm_image],
+        pixel_array=mask,
+        segmentation_type=hd.seg.SegmentationTypeValues.BINARY,
+        segment_descriptions=segment_descriptions,
+        series_instance_uid=hd.UID(),
+        series_number=1,
+        sop_instance_uid=hd.UID(),
+        instance_number=1,
+        manufacturer='Foo Corp.',
+        manufacturer_model_name='Slide Segmentation Algorithm',
+        software_versions='0.0.1',
+        device_serial_number='1234567890',
+        tile_pixel_array=True,
+        omit_empty_frames=False,
+        dimeension_organization_type=hd.DimensionOrganizationTypeValues.TILED_FULL,
+    )
+
+    # The result stores the mask as a set of 25 tiles of the entire region of
+    # the total pixel matrix, each of size (10, 10), matching the tile size of
+    # the source image
+    assert seg.NumberOfFrames == 25
+    assert seg.pixel_array.shape == (25, 10, 10)
+
+Multi-resolution Pyramids
+-------------------------
+
+Whole slide digital pathology images can often be very large and as such it
+is common to represent them as *multi-resolution pyramids* of images, i.e.
+to store multiple versions of the same image at different resolutions. This
+helps viewers render the image at different zoom levels.
+
+Within DICOM, this can also extend to segmentations derived from whole slide
+images. Multiple different SEG images may be stored, each representing the
+same segmentation at a different resolution, as different instances within a
+DICOM series.
+
+*highdicom* provides the :func:`highdicom.seg.create_segmentation_pyramid`
+function to assist with this process. This function handles multiple related
+scenarios:
+
+* Constructing a segmentation of a source image pyramid given a
+  segmentation pixel array of the highest resolution source image.
+  Highdicom performs the downsampling automatically to match the
+  resolution of the other source images. For this case, pass multiple
+  ``source_images`` and a single item in ``pixel_arrays``.
+* Constructing a segmentation of a source image pyramid given user-provided
+  segmentation pixel arrays for each level in the source pyramid. For this
+  case, pass multiple ``source_images`` and a matching number of
+  ``pixel_arrays``.
+* Constructing a segmentation of a single source image given multiple
+  user-provided downsampled segmentation pixel arrays. For this case, pass
+  a single item in ``source_images``, and multiple items in
+  ``pixel_arrays``).
+* Constructing a segmentation of a single source image and a single
+  segmentation pixel array by downsampling by a given list of
+  ``downsample_factors``. For this case, pass a single item in
+  ``source_images``, a single item in ``pixel_arrays``, and a list of one
+  or more desired ``downsample_factors``.
+
+Here is a simple of example of specifying a single source image and segmentation
+array, and having *highdicom* create a multi-resolution pyramid segmentation
+series at user-specified downsample factors.
+
+.. code-block:: python
+
+    import highdicom as hd
+    from pydicom import dcmread
+    import numpy as np
+
+
+    # Use an example slide microscopy image from the highdicom test data
+    # directory
+    sm_image = dcmread('data/test_files/sm_image.dcm')
+
+    # The source image has multiple frames/tiles, but here we create a mask
+    # corresponding to the entire total pixel matrix
+    mask = np.zeros(
+        (
+            sm_image.TotalPixelMatrixRows,
+            sm_image.TotalPixelMatrixColumns
+        ),
+        dtype=np.uint8,
+    )
+    mask[38:43, 5:41] = 1
+
+    property_category = hd.sr.CodedConcept("91723000", "SCT", "Anatomical Structure")
+    property_type = hd.sr.CodedConcept("84640000", "SCT", "Nucleus")
+    segment_descriptions = [
+        hd.seg.SegmentDescription(
+            segment_number=1,
+            segment_label='Segment #1',
+            segmented_property_category=property_category,
+            segmented_property_type=property_type,
+            algorithm_type=hd.seg.SegmentAlgorithmTypeValues.MANUAL,
+        ),
+    ]
+
+    # This will create a segmentation series of three images: one at the
+    # original source image resolution (implicit), one at half the size, and
+    # another at a quarter of the original size.
+    seg_pyramid = hd.seg.create_segmentation_pyramid(
+        source_images=[sm_image],
+        pixel_arrays=[mask],
+        segmentation_type=hd.seg.SegmentationTypeValues.BINARY,
+        segment_descriptions=segment_descriptions,
+        series_instance_uid=hd.UID(),
+        series_number=1,
+        manufacturer='Foo Corp.',
+        manufacturer_model_name='Slide Segmentation Algorithm',
+        software_versions='0.0.1',
+        device_serial_number='1234567890',
+        downsample_factors=[2.0, 4.0]
+    )
+
+Note that the :func:`highdicom.seg.create_segmentation_pyramid` function always
+behaves as if the ``tile_pixel_array`` input is ``True`` within the segmentation
+constructor, i.e. it assumes that the input segmentation masks represent total
+pixel matrices.
 
 Representation of Fractional SEGs
 ---------------------------------
@@ -606,6 +840,21 @@ and 1):
 - The clear frame boundaries make retrieving individual frames from
   ``"FRACTIONAL"`` image files possible.
 
+Multiprocessing
+---------------
+
+When creating large, multiframe ``"FRACTIONAL"`` segmentations using a
+compressed transfer syntax, the time taken to compress the frames can become
+large and dominate the time taken to create the segmentation. By default,
+frames are compressed in series using the main process, however the ``workers``
+parameter allows you to specify a number of additional worker processes that
+will be used to compress frames in parallel. Setting ``workers`` to a negative
+number uses all available processes on your machine. Note that while this is
+likely to result in significantly lower creations times for segmentations with
+a very large number of frames, for segmentations with only a few frames the
+additional overhead of spawning processes may in fact slow the entire
+segmentation creation process down.
+
 Geometry of SEG Images
 ----------------------
 
@@ -1101,6 +1350,88 @@ as stored in the SEG will be returned.
     # [0.        0.2509804 0.5019608]
 
 
+Reconstructing Total Pixel Matrices from Tiled Segmentations
+------------------------------------------------------------
+
+For segmentations of digital pathology images that are stored as tiled images,
+the :meth:`highdicom.seg.Segmentation.get_pixels_by_source_frame()` method will
+return the segmentation mask as a set of frames stacked down the first
+dimension of the array. However, for such images, you typically want to work
+with the large 2D total pixel matrix that is formed by correctly arranging the
+tiles into a 2D array. `highdicom` provides the
+:meth:`highdicom.seg.Segmentation.get_total_pixel_matrix()` method for this
+purpose.
+
+Called without any parameters, it returns a 3D array containing the full total
+pixel matrix. The first two dimensions are the spatial dimensions, and the
+third is the segments dimension. Behind the scenes highdicom has stitched
+together the required frames stored in the original file for you. Like with the
+other methods described above, setting ``combine_segments`` to ``True``
+combines all the segments into, in this case, a 2D array.
+
+.. code-block:: python
+
+    import highdicom as hd
+
+    # Read in the segmentation using highdicom
+    seg = hd.seg.segread('data/test_files/seg_image_sm_control.dcm')
+
+    # Get the full total pixel matrix
+    mask = seg.get_total_pixel_matrix()
+
+    expected_shape = (
+        seg.TotalPixelMatrixRows,
+        seg.TotalPixelMatrixColumns,
+        seg.number_of_segments,
+    )
+    assert mask.shape == expected_shape
+
+    # Combine the segments into a single array
+    mask = seg.get_total_pixel_matrix(combine_segments=True)
+
+    assert mask.shape == (seg.TotalPixelMatrixRows, seg.TotalPixelMatrixColumns)
+
+Furthermore, you can request a sub-region of the full total pixel matrix by
+specifying the start and/or stop indices for the rows and/or columns within the
+total pixel matrix. Note that this method follows DICOM 1-based convention for
+indexing rows and columns, i.e. the first row and column of the total pixel
+matrix are indexed by the number 1 (not 0 as is common within Python). Negative
+indices are also supported to index relative to the last row or column, with -1
+being the index of the last row or column. Like for standard Python indexing,
+the stop indices are specified as one beyond the final row/column in the
+returned array. Note that the requested region does not have to start or stop
+at the edges of the underlying frames: `highdicom` stitches together only the
+relevant parts of the frames to create the requested image for you.
+
+.. code-block:: python
+
+    import highdicom as hd
+
+    # Read in the segmentation using highdicom
+    seg = hd.seg.segread('data/test_files/seg_image_sm_control.dcm')
+
+    # Get a region of the total pixel matrix
+    mask = seg.get_total_pixel_matrix(
+        combine_segments=True,
+        row_start=20,
+        row_end=40,
+        column_start=10,
+        column_end=20,
+    )
+
+    assert mask.shape == (20, 10)
+
+    # A further example using negative indices. Since row_end is not provided,
+    # the default behavior is to include the last row in the total pixel matrix.
+    mask = seg.get_total_pixel_matrix(
+        combine_segments=True,
+        row_start=21,
+        column_start=-30,
+        column_end=-25,
+    )
+
+    assert mask.shape == (30, 5)
+
 Viewing DICOM SEG Images
 ------------------------