-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Segmentation Creation Efficiency #227
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the refactoring effort and the breaking down of the large constructor method into smaller focused helper methods. I have only a few minor comments and suggestions.
src/highdicom/seg/sop.py
Outdated
@staticmethod | ||
def _get_pixel_measures( | ||
source_image: Dataset, | ||
has_ref_frame_uid: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this parameter really necessary? Could this not readily be determined in the function body from the source_image
?
frame_of_reference_uid = getattr(source_image, 'FrameOfReferenceUID', None)
if frame_of_reference_uid is None:
...
else:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this was just how it was before the refactoring and I didn't change the logic. This is better simplified, addressed in 5f66eb3
src/highdicom/seg/sop.py
Outdated
@@ -1916,12 +1889,12 @@ def _check_and_cast_pixel_array( | |||
def _omit_empty_frames( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be sufficient for this function to compute the source_image_indices
(and potentially is_empty
)? If I understand the logic correctly, the indices could be used elsewhere to get the corresponding plane positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. The current behaviour predates this PR but your suggestion is definitely a nice simplification that makes things cleaner. I implemented it in 5cbd9bf and also renamed this method to _find_nonemtpy_frame_indices
, which now better describes what it actually does.
src/highdicom/seg/sop.py
Outdated
|
||
Returns | ||
------- | ||
index_values: List[int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it dimension_index_values
to match the function name and DICOM attribute name.
index_values: List[int] | |
dimension_index_values: List[int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>
…ns/highdicom into seg_creation_efficiency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CPBridge. LGTM. I left a few minor suggestions for your consideration.
|
||
self.SegmentSequence.append(segment_descriptions[i]) | ||
self.PerFrameFunctionalGroupsSequence = pffg_sequence | ||
self.NumberOfFrames = len(pffg_sequence) | ||
|
||
if is_encaps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part could potentially go into a separate _add_pixel_data
helper method at some point:
def _add_pixel_data(self, frames: List[numpy.ndarray], is_encapsulated: bool) -> None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually rewrite this piece substantially in an upcoming pull request related to multiprocessing (sneak peak!) so I'm going to leave as is for now. Agreed that it doesn't make a huge amount of sense at the moment
src/highdicom/seg/sop.py
Outdated
return (source_image_indices, False) | ||
|
||
@staticmethod | ||
def _get_segment_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency
def _get_segment_array( | |
def _get_segment_pixel_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in 9092a00
src/highdicom/seg/sop.py
Outdated
@@ -1793,10 +1612,175 @@ def _check_segment_numbers(described_segment_numbers: np.ndarray): | |||
f'from 1. Found {described_segment_numbers[0]}. ' | |||
) | |||
|
|||
@staticmethod | |||
def _get_pixel_measures( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the full attribute name may be clearer
def _get_pixel_measures( | |
def _get_pixel_measures_sequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in 9092a00
Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>
Co-authored-by: Markus D. Herrmann <hackermd@users.noreply.github.com>
Segmentation creation is slow when the number of frames is large. This PR contains several changes to improve this:
Dataset
to a Python list (rather than apydicom.Sequence
) and then cast the list to apydicom.Sequence
after the loop. The effect was a very dramatic speed up._omit_empty_frames
actually altered the input array. While this makes things conceptually a bit simpler, it is an unnecessary operation on a potentially very large array. I change this to leave the input array unaltered and changed the later indexing logic to select the correct frame such that empty plane positions are ignored.In the course of working on this, I discovered two bugs that I fixed:
I also factored out several parts of the complicated constructor to make it more readable and modular.
A further change I made was to demote some of the logger messages about frame omission etc from INFO to DEBUG. For large segmentations these messages become overwhelming and in my opinion do not belong at the INFO level.
I tested this new implementation on the large mulit-segment CT from #202 (around 650 frames with 98 segments), and saw a speed up of around 10x in creation time(!). Furthermore, I ensured that each individual change I made improved the efficiency (and not simply that the net effect of all changes improved efficiency).
Note that when creating FRACTIONAL segmentations with a transfer syntax that requires frame compression, the efficiency gains found above are much less significant relative to the time required to compress the frames. I intend to add an option to parallelise frame compression in a future PR.