Improve Segmentation Creation Efficiency #227

CPBridge · 2023-05-15T13:59:21Z

Segmentation creation is slow when the number of frames is large. This PR contains several changes to improve this:

Do not append directly to the PerFramesFunctionalGroupsSequence during construction. Surprisingly, this append operation was taking up a significant proportion of the time in each loop iteration. I changed this to append the Dataset to a Python list (rather than a pydicom.Sequence) and then cast the list to a pydicom.Sequence after the loop. The effect was a very dramatic speed up.
There are several unnecessary dtype cast operations in the previous implementation. These are very expensive with large arrays. In this PR they are removed, and the original dtype is retained for longer, with the final cast to uint8 at the end.
There are several checks on the input array, such as that the segment numbers match and that the entire array is not empty, that become very slow when the input array is large. I was able to speed some of these up by using alternative numpy operations, and combining intermediate results for some of the checks.
The previous implementation of _omit_empty_frames actually altered the input array. While this makes things conceptually a bit simpler, it is an unnecessary operation on a potentially very large array. I change this to leave the input array unaltered and changed the later indexing logic to select the correct frame such that empty plane positions are ignored.

In the course of working on this, I discovered two bugs that I fixed:

The previous implementation would not correctly apply the Maximum Fractional Value scaling in some situations. It seems that the tests had been fudged a little to let this through but I'm sure it's wrong. The tests have now been changed to enforce the correct behaviour.
In the previous implementation, it was possible that a user would pass two or more distinct source images with the same plane position (according to the relevant dimension indices). In this situation, the segmentation frames would only be recorded for the first of the frames. In other words the implementation would silently do the wrong thing. While I suppose in principle it may be possible to allow multiple segmentation frames with the same plane position but different SourceImageSequences, I imagine this use case is not common and not a high priority to support correctly. Therefore, in this PR I simply added a check that will raise an exception when this situation is encountered such that the library no longer silently fails as before.

I also factored out several parts of the complicated constructor to make it more readable and modular.

A further change I made was to demote some of the logger messages about frame omission etc from INFO to DEBUG. For large segmentations these messages become overwhelming and in my opinion do not belong at the INFO level.

I tested this new implementation on the large mulit-segment CT from #202 (around 650 frames with 98 segments), and saw a speed up of around 10x in creation time(!). Furthermore, I ensured that each individual change I made improved the efficiency (and not simply that the net effect of all changes improved efficiency).

Note that when creating FRACTIONAL segmentations with a transfer syntax that requires frame compression, the efficiency gains found above are much less significant relative to the time required to compress the frames. I intend to add an option to parallelise frame compression in a future PR.

…ility

hackermd

I really like the refactoring effort and the breaking down of the large constructor method into smaller focused helper methods. I have only a few minor comments and suggestions.

src/highdicom/seg/sop.py

hackermd · 2023-06-22T04:25:34Z

src/highdicom/seg/sop.py

+    @staticmethod
+    def _get_pixel_measures(
+        source_image: Dataset,
+        has_ref_frame_uid: bool,


Is this parameter really necessary? Could this not readily be determined in the function body from the source_image?

frame_of_reference_uid = getattr(source_image, 'FrameOfReferenceUID', None) if frame_of_reference_uid is None: ... else: ...

No, this was just how it was before the refactoring and I didn't change the logic. This is better simplified, addressed in 5f66eb3

hackermd · 2023-06-22T04:44:29Z

src/highdicom/seg/sop.py

@@ -1916,12 +1889,12 @@ def _check_and_cast_pixel_array(
    def _omit_empty_frames(


Wouldn't it be sufficient for this function to compute the source_image_indices (and potentially is_empty)? If I understand the logic correctly, the indices could be used elsewhere to get the corresponding plane positions.

Indeed. The current behaviour predates this PR but your suggestion is definitely a nice simplification that makes things cleaner. I implemented it in 5cbd9bf and also renamed this method to _find_nonemtpy_frame_indices, which now better describes what it actually does.

src/highdicom/seg/sop.py

hackermd · 2023-06-22T04:55:11Z

src/highdicom/seg/sop.py

+
+        Returns
+        -------
+        index_values: List[int]


Let's call it dimension_index_values to match the function name and DICOM attribute name.

Suggested change

index_values: List[int]

dimension_index_values: List[int]

Addressed in 10340a8 and 0881c32

src/highdicom/seg/sop.py

Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>

…ns/highdicom into seg_creation_efficiency

hackermd

Thanks @CPBridge. LGTM. I left a few minor suggestions for your consideration.

hackermd · 2023-07-26T06:20:31Z

src/highdicom/seg/sop.py


-            self.SegmentSequence.append(segment_descriptions[i])
+        self.PerFrameFunctionalGroupsSequence = pffg_sequence
+        self.NumberOfFrames = len(pffg_sequence)

        if is_encaps:


This part could potentially go into a separate _add_pixel_data helper method at some point:

def _add_pixel_data(self, frames: List[numpy.ndarray], is_encapsulated: bool) -> None

I actually rewrite this piece substantially in an upcoming pull request related to multiprocessing (sneak peak!) so I'm going to leave as is for now. Agreed that it doesn't make a huge amount of sense at the moment

src/highdicom/seg/sop.py

hackermd · 2023-07-26T06:28:10Z

src/highdicom/seg/sop.py

+        return (source_image_indices, False)
+
+    @staticmethod
+    def _get_segment_array(


For consistency

Suggested change

def _get_segment_array(

def _get_segment_pixel_array(

resolved in 9092a00

src/highdicom/seg/sop.py

hackermd · 2023-07-26T06:30:42Z

src/highdicom/seg/sop.py

@@ -1793,10 +1612,175 @@ def _check_segment_numbers(described_segment_numbers: np.ndarray):
                f'from 1. Found {described_segment_numbers[0]}. '
            )

+    @staticmethod
+    def _get_pixel_measures(


Using the full attribute name may be clearer

Suggested change

def _get_pixel_measures(

def _get_pixel_measures_sequence(

resolved in 9092a00

Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>

CPBridge added 4 commits May 14, 2023 18:52

Make checks more efficient, avoid appending to pydicom.Sequence

3b9204d

Add check for non-unique plane positions

20cdadb

Tidy up

fc74493

Move compression out of the loop, in preparation for multiprocessing

30f9dcc

CPBridge changed the base branch from master to v0.22.0dev May 16, 2023 10:37

CPBridge changed the base branch from v0.22.0dev to master May 16, 2023 10:39

CPBridge changed the base branch from master to v0.22.0dev May 16, 2023 10:39

CPBridge added 2 commits May 16, 2023 07:21

Simplify SegmentSequence logic

8b86a12

Factored out several parts of seg constructor into methods for readib…

1fa59f7

…ility

CPBridge requested a review from hackermd May 16, 2023 14:03

CPBridge added the enhancement New feature or request label May 16, 2023

CPBridge marked this pull request as ready for review May 16, 2023 14:04

CPBridge self-assigned this May 17, 2023

CPBridge mentioned this pull request May 28, 2023

Proposal: Defining and Prototyping "Labelmap" Segmentations in DICOM Format NA-MIC/ProjectWeek#643

Closed

CPBridge added 3 commits June 13, 2023 15:12

Fix for single channel floating point pixel array

3de6e18

Merge branch 'master' into seg_creation_efficiency

3e1a687

Codespell typo

552ae63

hackermd requested changes Jun 22, 2023

View reviewed changes

CPBridge and others added 12 commits June 22, 2023 14:28

Apply suggestions from code review

4440f64

Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>

Simplification of _get_pixel_measures

5f66eb3

Add missing Returns section to docstring

ce85cbd

Remove has_ref_frame_uid from _get_dimension_index_values

53a2392

rename index_values -> dimension_index_values

10340a8

Fix missing variable name change

0881c32

Add further explanation to _get_dimension_index_values docstring

1774199

Simplify _omit_empty_frames to find indices only

5cbd9bf

Rewrite as list comprehension

37ad160

remove unnecessary flattens to save memory

27eac18

Merge branch 'seg_creation_efficiency' of github.com:ImagingDataCommo…

26cdf0c

…ns/highdicom into seg_creation_efficiency

Tidy up of dimension indexing code

287047e

Add further docstring comment for _get_segment_array

c0d9f72

CPBridge requested a review from hackermd July 8, 2023 15:28

hackermd approved these changes Jul 26, 2023

View reviewed changes

CPBridge and others added 3 commits July 26, 2023 20:48

Update src/highdicom/seg/sop.py

30bb5f7

Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>

Apply suggestions from code review

f7878d5

Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>

Rename methods, fix lints

9092a00

CPBridge merged commit 551c2cd into v0.22.0dev Jul 27, 2023

CPBridge deleted the seg_creation_efficiency branch June 27, 2024 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Segmentation Creation Efficiency #227

Improve Segmentation Creation Efficiency #227

CPBridge commented May 15, 2023 •

edited

Loading

hackermd left a comment

hackermd Jun 22, 2023

CPBridge Jul 5, 2023

hackermd Jun 22, 2023

CPBridge Jul 6, 2023 •

edited

Loading

hackermd Jun 22, 2023

CPBridge Jul 5, 2023 •

edited

Loading

hackermd left a comment

hackermd Jul 26, 2023

CPBridge Jul 27, 2023 •

edited

Loading

hackermd Jul 26, 2023

CPBridge Jul 27, 2023

hackermd Jul 26, 2023

CPBridge Jul 27, 2023

		@@ -1916,12 +1889,12 @@ def _check_and_cast_pixel_array(
		def _omit_empty_frames(

Improve Segmentation Creation Efficiency #227

Improve Segmentation Creation Efficiency #227

Conversation

CPBridge commented May 15, 2023 • edited Loading

hackermd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CPBridge Jul 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CPBridge Jul 5, 2023 • edited Loading

Choose a reason for hiding this comment

hackermd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CPBridge Jul 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CPBridge commented May 15, 2023 •

edited

Loading

CPBridge Jul 6, 2023 •

edited

Loading

CPBridge Jul 5, 2023 •

edited

Loading

CPBridge Jul 27, 2023 •

edited

Loading