Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Segmentation Creation Efficiency #227

Merged
merged 25 commits into from
Jul 27, 2023
Merged

Conversation

CPBridge
Copy link
Collaborator

@CPBridge CPBridge commented May 15, 2023

Segmentation creation is slow when the number of frames is large. This PR contains several changes to improve this:

  • Do not append directly to the PerFramesFunctionalGroupsSequence during construction. Surprisingly, this append operation was taking up a significant proportion of the time in each loop iteration. I changed this to append the Dataset to a Python list (rather than a pydicom.Sequence) and then cast the list to a pydicom.Sequence after the loop. The effect was a very dramatic speed up.
  • There are several unnecessary dtype cast operations in the previous implementation. These are very expensive with large arrays. In this PR they are removed, and the original dtype is retained for longer, with the final cast to uint8 at the end.
  • There are several checks on the input array, such as that the segment numbers match and that the entire array is not empty, that become very slow when the input array is large. I was able to speed some of these up by using alternative numpy operations, and combining intermediate results for some of the checks.
  • The previous implementation of _omit_empty_frames actually altered the input array. While this makes things conceptually a bit simpler, it is an unnecessary operation on a potentially very large array. I change this to leave the input array unaltered and changed the later indexing logic to select the correct frame such that empty plane positions are ignored.

In the course of working on this, I discovered two bugs that I fixed:

  • The previous implementation would not correctly apply the Maximum Fractional Value scaling in some situations. It seems that the tests had been fudged a little to let this through but I'm sure it's wrong. The tests have now been changed to enforce the correct behaviour.
  • In the previous implementation, it was possible that a user would pass two or more distinct source images with the same plane position (according to the relevant dimension indices). In this situation, the segmentation frames would only be recorded for the first of the frames. In other words the implementation would silently do the wrong thing. While I suppose in principle it may be possible to allow multiple segmentation frames with the same plane position but different SourceImageSequences, I imagine this use case is not common and not a high priority to support correctly. Therefore, in this PR I simply added a check that will raise an exception when this situation is encountered such that the library no longer silently fails as before.

I also factored out several parts of the complicated constructor to make it more readable and modular.

A further change I made was to demote some of the logger messages about frame omission etc from INFO to DEBUG. For large segmentations these messages become overwhelming and in my opinion do not belong at the INFO level.

I tested this new implementation on the large mulit-segment CT from #202 (around 650 frames with 98 segments), and saw a speed up of around 10x in creation time(!). Furthermore, I ensured that each individual change I made improved the efficiency (and not simply that the net effect of all changes improved efficiency).

Note that when creating FRACTIONAL segmentations with a transfer syntax that requires frame compression, the efficiency gains found above are much less significant relative to the time required to compress the frames. I intend to add an option to parallelise frame compression in a future PR.

@CPBridge CPBridge changed the base branch from master to v0.22.0dev May 16, 2023 10:37
@CPBridge CPBridge changed the base branch from v0.22.0dev to master May 16, 2023 10:39
@CPBridge CPBridge changed the base branch from master to v0.22.0dev May 16, 2023 10:39
@CPBridge CPBridge requested a review from hackermd May 16, 2023 14:03
@CPBridge CPBridge added the enhancement New feature or request label May 16, 2023
@CPBridge CPBridge marked this pull request as ready for review May 16, 2023 14:04
@CPBridge CPBridge self-assigned this May 17, 2023
Copy link
Collaborator

@hackermd hackermd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the refactoring effort and the breaking down of the large constructor method into smaller focused helper methods. I have only a few minor comments and suggestions.

src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
@staticmethod
def _get_pixel_measures(
source_image: Dataset,
has_ref_frame_uid: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this parameter really necessary? Could this not readily be determined in the function body from the source_image?

frame_of_reference_uid = getattr(source_image, 'FrameOfReferenceUID', None)
if frame_of_reference_uid is None:
    ...
else:
    ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was just how it was before the refactoring and I didn't change the logic. This is better simplified, addressed in 5f66eb3

@@ -1916,12 +1889,12 @@ def _check_and_cast_pixel_array(
def _omit_empty_frames(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be sufficient for this function to compute the source_image_indices (and potentially is_empty)? If I understand the logic correctly, the indices could be used elsewhere to get the corresponding plane positions.

Copy link
Collaborator Author

@CPBridge CPBridge Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The current behaviour predates this PR but your suggestion is definitely a nice simplification that makes things cleaner. I implemented it in 5cbd9bf and also renamed this method to _find_nonemtpy_frame_indices, which now better describes what it actually does.

src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
src/highdicom/seg/sop.py Show resolved Hide resolved
src/highdicom/seg/sop.py Outdated Show resolved Hide resolved

Returns
-------
index_values: List[int]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it dimension_index_values to match the function name and DICOM attribute name.

Suggested change
index_values: List[int]
dimension_index_values: List[int]

Copy link
Collaborator Author

@CPBridge CPBridge Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 10340a8 and 0881c32

src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
src/highdicom/seg/sop.py Show resolved Hide resolved
src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
@CPBridge CPBridge requested a review from hackermd July 8, 2023 15:28
Copy link
Collaborator

@hackermd hackermd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CPBridge. LGTM. I left a few minor suggestions for your consideration.


self.SegmentSequence.append(segment_descriptions[i])
self.PerFrameFunctionalGroupsSequence = pffg_sequence
self.NumberOfFrames = len(pffg_sequence)

if is_encaps:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part could potentially go into a separate _add_pixel_data helper method at some point:

def _add_pixel_data(self, frames: List[numpy.ndarray], is_encapsulated: bool) -> None

Copy link
Collaborator Author

@CPBridge CPBridge Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually rewrite this piece substantially in an upcoming pull request related to multiprocessing (sneak peak!) so I'm going to leave as is for now. Agreed that it doesn't make a huge amount of sense at the moment

src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
return (source_image_indices, False)

@staticmethod
def _get_segment_array(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency

Suggested change
def _get_segment_array(
def _get_segment_pixel_array(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 9092a00

src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
src/highdicom/seg/sop.py Outdated Show resolved Hide resolved
@@ -1793,10 +1612,175 @@ def _check_segment_numbers(described_segment_numbers: np.ndarray):
f'from 1. Found {described_segment_numbers[0]}. '
)

@staticmethod
def _get_pixel_measures(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the full attribute name may be clearer

Suggested change
def _get_pixel_measures(
def _get_pixel_measures_sequence(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 9092a00

CPBridge and others added 3 commits July 26, 2023 20:48
Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>
Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>
@CPBridge CPBridge merged commit 551c2cd into v0.22.0dev Jul 27, 2023
@CPBridge CPBridge deleted the seg_creation_efficiency branch June 27, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants