-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing for frame encoding in Segmentation constructor #245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I really like that parallel encoding of frames using multiple workers. It makes me wonder whether we should use multiple workers for decoding of frames as well.
src/highdicom/seg/sop.py
Outdated
segment_array[plane_index].flatten() | ||
) | ||
# Concatenate the 1D array for encoding at the end | ||
full_frames_list.append(segment_array.flatten()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether a separate variable is needed or whether frames (compressed or not) could just be stored in a list called frames
? How the list of frames then needs to be handled (encapsulation or native encoding) is determined by is_enaps
.
The variable for the futures could be called frames_futures
.
In my opinion, that would make it easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way primarily to appease mypy, but I guess it probably hurts readability. Implemented 5c5cbb6 as you suggest
Yes I have definitely thought about decoding with multiple workers too, one for the to-do list. |
Add the ability to use concurrency to encode frames when creating segmentations.
When using encapsulated transfer syntaxes, the process of encoding frames takes a significant amount of the total time for creation of a Segmentation object. For large multiframe segmentations such as those of WSIs, this can take several minutes. Parallelising the encoding of frames is an obvious place for significant performance gains.
In this PR, I add the ability to use multiprocessing for frame encoding using the Python standard library's
concurrent.futures
module. The user specifies aworkers
parameter governing the number of worker processes to spawn (with the default remaining no worker threads). Alternatively they may "bring their own" worker pool by providing an instance of anyconcurrent.futures.Executor
. This has two potential benefits. First it allows a worker pool to be re-used in programs where a large number of segmentations needs to be created to save to overhead of setting up the worker pools. Secondly, it gives the user more control over the multiprocessing context if they want it. For example, in principle, they could pass a ThreadPoolExecutor to use multi-threading rather that multiprocessing.I also made a few other tweaks to the segmentation array processing for further efficiency gains. Most notably, I adjusted the
_get_segment_pixel_array
method to return the array for a single frame and a single segment (as opposed to a all frames and a single segment). This makes the looping logic a bit simpler and it turns out that this is slightly faster, presumably because it avoid the need to allocate memory for large arrays.