Skip to content

Commit

Permalink
Merge pull request #269 from ImagingDataCommons/v0.22.0dev
Browse files Browse the repository at this point in the history
Version 0.22.0
  • Loading branch information
CPBridge committed Nov 9, 2023
2 parents 3bf759d + edf6198 commit aad155c
Show file tree
Hide file tree
Showing 21 changed files with 4,647 additions and 927 deletions.
Binary file modified data/test_files/seg_image_sm_control.dcm
Binary file not shown.
Binary file added data/test_files/seg_image_sm_dots_tiled_full.dcm
Binary file not shown.
20 changes: 16 additions & 4 deletions docs/ann.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,20 @@ transmitted over network, etc.
Reading Existing Bulk Annotation Objects
----------------------------------------

You can read an existing bulk annotation object using `pydicom` and then convert
to the `highdicom` object like this:
You can read an existing bulk annotation object from file using the
:func:`highdicom.ann.annread()` function:

.. code-block:: python
from pydicom import dcmread
import highdicom as hd
ann = hd.ann.annread('data/test_files/sm_annotations.dcm')
assert isinstance(ann, hd.ann.MicroscopyBulkSimpleAnnotations)
Alternatively you can converting an existing ``pydicom.Dataset`` representing a
bulk annotation object to the `highdicom` object like this:

.. code-block:: python
Expand All @@ -198,7 +210,7 @@ to the `highdicom` object like this:
assert isinstance(ann, hd.ann.MicroscopyBulkSimpleAnnotations)
Note that this example (and the following examples) uses an example file that
Note that these examples (and the following examples) uses an example file that
you can access from the test data in the `highdicom` repository. It was created
using exactly the code in the construction example above.

Expand Down Expand Up @@ -298,7 +310,7 @@ passed the data in to create the annotation with `highdicom`.
import numpy as np
graphic_data = group.get_graphic_data(
coordinate_type=ann.AnnotationCoordinateType,
coordinate_type=ann.annotation_coordinate_type,
)
assert len(graphic_data) == 2 and isinstance(graphic_data[0], np.ndarray)
Expand Down
331 changes: 331 additions & 0 deletions docs/seg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,240 @@ segments.
device_serial_number='1234567890',
)
Constructing SEG Images from a Total Pixel Matrix
-------------------------------------------------

Some digital pathology images are represented as "tiled" images,
in which the full image (known as the "total pixel matrix") is divided up
into smaller rectangular regions in the row and column dimensions and each
region ("tile") is stored as a frame in a multiframe DICOM image.

Segmentations of such images are stored as a tiled image in the same manner.
There are a two options in `highdicom` for doing this. You can either pass each
tile/frame individually stacked as a 1D list down the first dimension of the
``pixel_array`` as we have already seen (with the location of each frame either
matching that of the corresponding frame in the source image or explicitly
specified in the ``plane_positions`` argument), or you can pass the 2D total
pixel matrix of the segmentation and have `highdicom` automatically create the
tiles for you.

To enable this latter option, pass the ``pixel_array`` as a single frame (i.e.
a 2D labelmap array, a 3D labelmap array with a single frame stacked down the
first axis, or a 4D array with a single frame stacked down the first dimension
and any number of segments stacked down the last dimension) and set the
``tile_pixel_array`` argument to ``True``. You can optionally choose the size
(in pixels) of each tile using the ``tile_size`` argument, or, by default, the
tile size of the source image will be used (regardless of whether the
segmentation is represented at the same resolution as the source image).

If you need to specify the plane positions of the image explicitly, you should
pass a single item to the ``plane_positions`` argument giving the location of
the top left corner of the full total pixel matrix. Otherwise, all the usual
options are available to you.

.. code-block:: python
# Use an example slide microscopy image from the highdicom test data
# directory
sm_image = dcmread('data/test_files/sm_image.dcm')
# The source image has multiple frames/tiles, but here we create a mask
# corresponding to the entire total pixel matrix
mask = np.zeros(
(
sm_image.TotalPixelMatrixRows,
sm_image.TotalPixelMatrixColumns
),
dtype=np.uint8,
)
mask[38:43, 5:41] = 1
property_category = hd.sr.CodedConcept("91723000", "SCT", "Anatomical Structure")
property_type = hd.sr.CodedConcept("84640000", "SCT", "Nucleus")
segment_descriptions = [
hd.seg.SegmentDescription(
segment_number=1,
segment_label='Segment #1',
segmented_property_category=property_category,
segmented_property_type=property_type,
algorithm_type=hd.seg.SegmentAlgorithmTypeValues.MANUAL,
),
]
seg = hd.seg.Segmentation(
source_images=[sm_image],
pixel_array=mask,
segmentation_type=hd.seg.SegmentationTypeValues.BINARY,
segment_descriptions=segment_descriptions,
series_instance_uid=hd.UID(),
series_number=1,
sop_instance_uid=hd.UID(),
instance_number=1,
manufacturer='Foo Corp.',
manufacturer_model_name='Slide Segmentation Algorithm',
software_versions='0.0.1',
device_serial_number='1234567890',
tile_pixel_array=True,
)
# The result stores the mask as a set of 10 tiles of the non-empty region of
# the total pixel matrix, each of size (10, 10), matching # the tile size of
# the source image
assert seg.NumberOfFrames == 10
assert seg.pixel_array.shape == (10, 10, 10)
``"TILED_FULL"`` and ``"TILED_SPARSE"``
---------------------------------------

When the segmentation is stored as a tiled image, there are two ways in which
the locations of each frame/tile may be specified in the resulting object.
These are defined by the value of the
`"DimensionOrganizationType"
<https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.17.html#table_C.7.6.17-1>`_
attribute:

- ``"TILED_SPARSE"``: The position of each tile is explicitly defined in the
`"PerFrameFunctionalGroupsSequence"
<https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.16.html#table_C.7.6.16-1>`_
of the object. This requires a potentially very long sequence to store all
the per-frame metadata, but does allow for the omission of empty frames from
the segmentation and other irregular tiling strategies.
- ``"TILED_FULL"``: The position of each tile is implicitly defined using a
predetermined order of the frames. This saves the need to store the pre-frame
metadata but does not allow for the omission of empty frames of the
segmentation and is generally less flexible. It may also be simpler for a
receiving application to process, since the tiles are guaranteed to be
regularly and consistently ordered.

You can control this behavior by specifying the
``dimension_organization_type`` parameter and passing a value of the
:class:`highdicom.DimensionOrganizationTypeValues` enum. The default value is
``"TILED_SPARSE"``. Generally, the ``"TILED_FULL"`` option will be used in
combination with ``tile_pixel_array`` argument.


.. code-block:: python
# Using the same example as above, this time as TILED_FULL
seg = hd.seg.Segmentation(
source_images=[sm_image],
pixel_array=mask,
segmentation_type=hd.seg.SegmentationTypeValues.BINARY,
segment_descriptions=segment_descriptions,
series_instance_uid=hd.UID(),
series_number=1,
sop_instance_uid=hd.UID(),
instance_number=1,
manufacturer='Foo Corp.',
manufacturer_model_name='Slide Segmentation Algorithm',
software_versions='0.0.1',
device_serial_number='1234567890',
tile_pixel_array=True,
omit_empty_frames=False,
dimeension_organization_type=hd.DimensionOrganizationTypeValues.TILED_FULL,
)
# The result stores the mask as a set of 25 tiles of the entire region of
# the total pixel matrix, each of size (10, 10), matching the tile size of
# the source image
assert seg.NumberOfFrames == 25
assert seg.pixel_array.shape == (25, 10, 10)
Multi-resolution Pyramids
-------------------------

Whole slide digital pathology images can often be very large and as such it
is common to represent them as *multi-resolution pyramids* of images, i.e.
to store multiple versions of the same image at different resolutions. This
helps viewers render the image at different zoom levels.

Within DICOM, this can also extend to segmentations derived from whole slide
images. Multiple different SEG images may be stored, each representing the
same segmentation at a different resolution, as different instances within a
DICOM series.

*highdicom* provides the :func:`highdicom.seg.create_segmentation_pyramid`
function to assist with this process. This function handles multiple related
scenarios:

* Constructing a segmentation of a source image pyramid given a
segmentation pixel array of the highest resolution source image.
Highdicom performs the downsampling automatically to match the
resolution of the other source images. For this case, pass multiple
``source_images`` and a single item in ``pixel_arrays``.
* Constructing a segmentation of a source image pyramid given user-provided
segmentation pixel arrays for each level in the source pyramid. For this
case, pass multiple ``source_images`` and a matching number of
``pixel_arrays``.
* Constructing a segmentation of a single source image given multiple
user-provided downsampled segmentation pixel arrays. For this case, pass
a single item in ``source_images``, and multiple items in
``pixel_arrays``).
* Constructing a segmentation of a single source image and a single
segmentation pixel array by downsampling by a given list of
``downsample_factors``. For this case, pass a single item in
``source_images``, a single item in ``pixel_arrays``, and a list of one
or more desired ``downsample_factors``.

Here is a simple of example of specifying a single source image and segmentation
array, and having *highdicom* create a multi-resolution pyramid segmentation
series at user-specified downsample factors.

.. code-block:: python
import highdicom as hd
from pydicom import dcmread
import numpy as np
# Use an example slide microscopy image from the highdicom test data
# directory
sm_image = dcmread('data/test_files/sm_image.dcm')
# The source image has multiple frames/tiles, but here we create a mask
# corresponding to the entire total pixel matrix
mask = np.zeros(
(
sm_image.TotalPixelMatrixRows,
sm_image.TotalPixelMatrixColumns
),
dtype=np.uint8,
)
mask[38:43, 5:41] = 1
property_category = hd.sr.CodedConcept("91723000", "SCT", "Anatomical Structure")
property_type = hd.sr.CodedConcept("84640000", "SCT", "Nucleus")
segment_descriptions = [
hd.seg.SegmentDescription(
segment_number=1,
segment_label='Segment #1',
segmented_property_category=property_category,
segmented_property_type=property_type,
algorithm_type=hd.seg.SegmentAlgorithmTypeValues.MANUAL,
),
]
# This will create a segmentation series of three images: one at the
# original source image resolution (implicit), one at half the size, and
# another at a quarter of the original size.
seg_pyramid = hd.seg.create_segmentation_pyramid(
source_images=[sm_image],
pixel_arrays=[mask],
segmentation_type=hd.seg.SegmentationTypeValues.BINARY,
segment_descriptions=segment_descriptions,
series_instance_uid=hd.UID(),
series_number=1,
manufacturer='Foo Corp.',
manufacturer_model_name='Slide Segmentation Algorithm',
software_versions='0.0.1',
device_serial_number='1234567890',
downsample_factors=[2.0, 4.0]
)
Note that the :func:`highdicom.seg.create_segmentation_pyramid` function always
behaves as if the ``tile_pixel_array`` input is ``True`` within the segmentation
constructor, i.e. it assumes that the input segmentation masks represent total
pixel matrices.

Representation of Fractional SEGs
---------------------------------
Expand Down Expand Up @@ -606,6 +840,21 @@ and 1):
- The clear frame boundaries make retrieving individual frames from
``"FRACTIONAL"`` image files possible.

Multiprocessing
---------------

When creating large, multiframe ``"FRACTIONAL"`` segmentations using a
compressed transfer syntax, the time taken to compress the frames can become
large and dominate the time taken to create the segmentation. By default,
frames are compressed in series using the main process, however the ``workers``
parameter allows you to specify a number of additional worker processes that
will be used to compress frames in parallel. Setting ``workers`` to a negative
number uses all available processes on your machine. Note that while this is
likely to result in significantly lower creations times for segmentations with
a very large number of frames, for segmentations with only a few frames the
additional overhead of spawning processes may in fact slow the entire
segmentation creation process down.

Geometry of SEG Images
----------------------

Expand Down Expand Up @@ -1101,6 +1350,88 @@ as stored in the SEG will be returned.
# [0. 0.2509804 0.5019608]
Reconstructing Total Pixel Matrices from Tiled Segmentations
------------------------------------------------------------

For segmentations of digital pathology images that are stored as tiled images,
the :meth:`highdicom.seg.Segmentation.get_pixels_by_source_frame()` method will
return the segmentation mask as a set of frames stacked down the first
dimension of the array. However, for such images, you typically want to work
with the large 2D total pixel matrix that is formed by correctly arranging the
tiles into a 2D array. `highdicom` provides the
:meth:`highdicom.seg.Segmentation.get_total_pixel_matrix()` method for this
purpose.

Called without any parameters, it returns a 3D array containing the full total
pixel matrix. The first two dimensions are the spatial dimensions, and the
third is the segments dimension. Behind the scenes highdicom has stitched
together the required frames stored in the original file for you. Like with the
other methods described above, setting ``combine_segments`` to ``True``
combines all the segments into, in this case, a 2D array.

.. code-block:: python
import highdicom as hd
# Read in the segmentation using highdicom
seg = hd.seg.segread('data/test_files/seg_image_sm_control.dcm')
# Get the full total pixel matrix
mask = seg.get_total_pixel_matrix()
expected_shape = (
seg.TotalPixelMatrixRows,
seg.TotalPixelMatrixColumns,
seg.number_of_segments,
)
assert mask.shape == expected_shape
# Combine the segments into a single array
mask = seg.get_total_pixel_matrix(combine_segments=True)
assert mask.shape == (seg.TotalPixelMatrixRows, seg.TotalPixelMatrixColumns)
Furthermore, you can request a sub-region of the full total pixel matrix by
specifying the start and/or stop indices for the rows and/or columns within the
total pixel matrix. Note that this method follows DICOM 1-based convention for
indexing rows and columns, i.e. the first row and column of the total pixel
matrix are indexed by the number 1 (not 0 as is common within Python). Negative
indices are also supported to index relative to the last row or column, with -1
being the index of the last row or column. Like for standard Python indexing,
the stop indices are specified as one beyond the final row/column in the
returned array. Note that the requested region does not have to start or stop
at the edges of the underlying frames: `highdicom` stitches together only the
relevant parts of the frames to create the requested image for you.

.. code-block:: python
import highdicom as hd
# Read in the segmentation using highdicom
seg = hd.seg.segread('data/test_files/seg_image_sm_control.dcm')
# Get a region of the total pixel matrix
mask = seg.get_total_pixel_matrix(
combine_segments=True,
row_start=20,
row_end=40,
column_start=10,
column_end=20,
)
assert mask.shape == (20, 10)
# A further example using negative indices. Since row_end is not provided,
# the default behavior is to include the last row in the total pixel matrix.
mask = seg.get_total_pixel_matrix(
combine_segments=True,
row_start=21,
column_start=-30,
column_end=-25,
)
assert mask.shape == (30, 5)
Viewing DICOM SEG Images
------------------------

Expand Down

0 comments on commit aad155c

Please sign in to comment.