Skip to content

Include padding in max slices and number of blocks calculations#680

Merged
yousefmoazzam merged 7 commits intomainfrom
padding-aware-block-splitter
Feb 11, 2026
Merged

Include padding in max slices and number of blocks calculations#680
yousefmoazzam merged 7 commits intomainfrom
padding-aware-block-splitter

Conversation

@yousefmoazzam
Copy link
Collaborator

@yousefmoazzam yousefmoazzam commented Feb 2, 2026

Fixes #453
Fixes IMGDA-726

As mentioned in 3936d41, part of this change is essentially reverting the preview extending implemented in #670.

The two main changes are:

  • the BlockSplitter is aware of the padding that blocks it gets from the `DataSetSource has
  • the max slices calculation is aware of the padding applied to the chunk that is associated with a section

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have made corresponding changes to the documentation

@yousefmoazzam yousefmoazzam requested a review from dkazanc February 2, 2026 15:23
@yousefmoazzam yousefmoazzam marked this pull request as draft February 2, 2026 15:33
@yousefmoazzam yousefmoazzam removed the request for review from dkazanc February 2, 2026 15:33
@yousefmoazzam yousefmoazzam marked this pull request as ready for review February 2, 2026 16:03
@yousefmoazzam yousefmoazzam requested a review from dkazanc February 2, 2026 16:03
@dkazanc
Copy link
Collaborator

dkazanc commented Feb 4, 2026

thanks @yousefmoazzam . So, if I understand correctly, this change removes the need to expand the previewer, but rather deal with how the data should be loaded assuming that padding is needed?

I'm just curious about those 2 cases, can you help please:

  1. We have 1 slice in the preview, padding = (5, 5) and 4 processes. determine_max_slices tells that we can fit more than 11 slices on the GPU. What happens in this case?
  2. Same conditions: 1 slice in the preview, padding = (5, 5) and 4 processes, but determine_max_slices says that we can fit only 5 slices for instance.

Cheers!

@yousefmoazzam
Copy link
Collaborator Author

this change removes the need to expand the previewer

Yeah, this change has removed all code that expands the preview based on methods padding requirements

but rather deal with how the data should be loaded assuming that padding is needed?

This change doesn't do anything to data loading, that's the same as before - which is that the data needed for the next section is padded as required by either the loader or the dataset store. This change mainly deals with changing how the number of blocks are calculated, and how the max slices is calculated.

I'm just curious about those 2 cases, can you help please:

  1. We have 1 slice in the preview, padding = (5, 5) and 4 processes. determine_max_slices tells that we can fit more than 11 slices on the GPU. What happens in this case?
  2. Same conditions: 1 slice in the preview, padding = (5, 5) and 4 processes, but determine_max_slices says that we can fit only 5 slices for instance.

For case 1, I think this shouldn't happen. In determine_max_slices(), the max slices starts off as being the length of the padded chunk:

max_slices = (
data_shape[slicing_dim] + self.source.padding[0] + self.source.padding[1]
)

and as memory estimation of each method is done, the minimum of the current max slices and the method's max slices is taken:

max_slices_methods[idx] = min(max_slices, slices_estimated)

If the section is 1 slice unpadded and 11 slices padded, then determine_max_slices() should never calculate a max slices value above 11 slices.

For case 2, as things are currently, I think this would break somewhere in the block splitter! This is an edge case where the calculated max slices is lower than 1 slice + padding (1 + 10 = 11 in this case), and there'd be a negative value somewhere in the block splitter which would likely cause problems. Thanks for pointing this out, this needs to be handled in some manner, otherwise some confusing/hard to debug errors will probably occur in such a case.

If that case does happen, it'd be that not even 1 slice can be processed due to padding requirements: would it be reasonable to throw an error in such a case? Being able to handle low-ish GPU resources is fair, but if the machine's GPU can't even process 1 slice, I'm not sure what else can be done to salvage the pipeline run?

@dkazanc
Copy link
Collaborator

dkazanc commented Feb 4, 2026

thanks @yousefmoazzam, I see now that
section.max_slices -= padding[0] + padding[1]
been replaced with

 data_shape = self.source.chunk_shape
        max_slices = (
            data_shape[slicing_dim] + self.source.padding[0] + self.source.padding[1] )

which takes into account padding rather than subtracting from the estimated slices as before.

Yes, the second case should probably throw a meaningful error, I guess the padded method establishes a some kind of limit on the size of the chunk/block that can be processed. This case can happen easily when, for instance, iterative methods are running on a small memory GPU systems. One potential solution would be to go 2D when 3D padding doesn't fit the memory, but, as you say, it is the edge case for "laptop"-type processing. We might handle it properly later, if needed at all.

@dkazanc
Copy link
Collaborator

dkazanc commented Feb 4, 2026

Also I think this ticket will be fixed with this change. I'll check if that this the case.

@yousefmoazzam
Copy link
Collaborator Author

yousefmoazzam commented Feb 5, 2026

I had a think about what was discussed in the sprint planning regarding case 2 above and potentially reducing the padding so then the GPU could fit the core slice(s) + padding slices.

The main thing I thought would be an issue is that the data source object for a section is created before the max slices for that section is determined, meaning that the padding had already been determined and thus couldn't be changed by the time execution gets to determine_max_slices().

I think if the dataset store is backed by an hdf5 file, this isn't a problem, and nothing special needs to be done other than change the padding of the data source object.

But, if the dataset store is backed by RAM, then it is a problem and something additional needs to be done other than simply changing the padding of the data source object. The issue is that the edges of the padded chunk array would have more padding than needed (for example, original padding was (5, 5) but was reduced to (3, 3) to be able to fit data into GPU memory). So a shift of some sort would need to be introduced for reading blocks at the boundaries of the padded chunk array to make the extra/unnecessary padding slices be ignored.

One option could be to keep determine_max_slices() after the data source object is created, and introduce the shifting as mentioned above. Another option could be to move determine_max_slices() before the data source is created, and somehow determine the padding that'd allow at least one slice (if not more) be processed, and then create the data source object.

I think both of these options are non-trivial to implement, and given that this is an edge case for when GPU memory can hold barely any slices, I'm tempted to leave this for later and simply raise an error for now. What do you think?

@dkazanc
Copy link
Collaborator

dkazanc commented Feb 6, 2026

thank for looking into it @yousefmoazzam. Fair enough, as we suspected it is not as simple as to reduce the padding after the estimation and proceed. Happy with leaving this for later! I guess we can catch this case and insert a sensible error? May be we should say that the the chosen method (method that requires the largest padding in the pipeline) cannot process the data due to GPU memory limitations. Please remove that method from the pipeline or run it on more powerful GPU device?
Can we do something like that?
The 3rd option is to enable 2D processing for 3D methods that need padding, then we don't need to do much in the framework. It is a bit of work though, and hopefully people won't be using 2-4GB GPUs to process the data :)

@yousefmoazzam
Copy link
Collaborator Author

I guess we can catch this case and insert a sensible error? May be we should say that the the chosen method (method that requires the largest padding in the pipeline) cannot process the data due to GPU memory limitations. Please remove that method from the pipeline or run it on more powerful GPU device?
Can we do something like that?

I think this should be possible, I'll give it a go!

Apart from some minor irrelevant details, this essentially reverts #670.
The shape of the data being transferred is in principle dependent on the
max slices calculation, which in turn is dependent on the GPU model
being used. As the GPU model can vary widely, the max slices and
therefore the data shape can vary widely, making this assertion
difficult to pass if run on different GPUs.

As the purpose of the small pipeline tests is not focused on specific
data shapes, but functional behaviour of pipeline execution, asserting
that there's a line in the logfile corresponding to a data transfer
occurring for GPU 0 is sufficient, and there's no need to be specific
about the shape of the data that was transferred.
See the following PR comments for a discussion on what could be done in
the future to try and salvage the pipeline run instead of raising an
error: #680.
@yousefmoazzam yousefmoazzam force-pushed the padding-aware-block-splitter branch from f535c19 to 5c0e26c Compare February 6, 2026 13:42
@yousefmoazzam yousefmoazzam requested review from dkazanc and removed request for dkazanc February 6, 2026 13:50
@yousefmoazzam yousefmoazzam merged commit c94fccd into main Feb 11, 2026
6 checks passed
@yousefmoazzam yousefmoazzam deleted the padding-aware-block-splitter branch February 11, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present

2 participants