Entire chunk that can fit in GPU memory will be split into two blocks if padding method present

Firstly, it's worth pointing out that being able to fit an entire chunk into GPU memory is not that common, so this issue is not very relevant to processing big data - it was discovered when running on the small test data that easily fits into GPU memory.

On commit 930e0da in #446, running `tomo_standard.nxs` with the `pipeline_gpu1.yaml` pipeline one can see that the small test data gets split into two blocks (whereas if padding is switched off for `remove_outlier` in the associated methods database YAML file there is only one block):
```
(base) root@492a2a3538d1:/httomo# python -m httomo run tests/test_data/tomo_standard.nxs tests/samples/pipeline_template_examples/pipeline_gpu1.yaml output_dir/
Pipeline has been separated into 2 sections
See the full log file at: output_dir/19-09-2024_09_44_59_output/user.log
Running loader (pattern=projection): standard_tomo...
    Finished loader: standard_tomo (httomo) Took 37.68ms
Section 0 (pattern=projection) with the following methods:
    data_reducer (httomolib)
    find_center_vo (httomolibgpu)
    remove_outlier (httomolibgpu)
    normalize (httomolibgpu)
     0%|          | 0/2 [00:00<?, ?block/s]
    50%|#####     | 1/2 [00:00<00:00,  1.29block/s]
    --->The center of rotation is 79.5
    Finished processing last block
Section 1 (pattern=sinogram) with the following methods:
    remove_stripe_based_sorting (httomolibgpu)
    FBP (httomolibgpu)
    save_intermediate_data (httomo)
    save_to_images (httomolib)
     0%|          | 0/1 [00:00<?, ?block/s]
    Finished processing last block
Pipeline finished. Took 1.937s
```

This is due to how the max slices is calculated. The absolute maximum that the max slices can be (at the start of the function, before being potentially whittled down by the different methods in a section) is based on the `chunk_shape` of the data source: https://github.com/DiamondLightSource/httomo/blob/930e0da4744924af906c98945bc018d410ce92c7/httomo/runner/task_runner.py#L317-L318

The `chunk_shape` property on any implementor of `DataSetSource` does _not_ include padding. Therefore, the absolute max slices that is started with is the length of the chunk shape's slicing dim _unpadded_. Therefore, even if the GPU _could_ fit:
- all slices
- plus, the necessary padding slices

the `determine_max_slices()` method only can report that max slices is "all slices without padding slices".

In the context of the test data, the max slices calculated is 180 slices (all projections). But, execution of the first section needs 2 padding slices added, so there are 182 slices to process. Because the max slices is only 180 and not 182 (even though the GPU can fit 182 slices in memory), this forces the chunk to be split into 2 blocks.

In order to fix this, I think the `determine_max_slices()` logic needs to account for required padding slices, to handle the case when the max slice + padding slices could also fit into GPU memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	data_shape = self.source.chunk_shape
	max_slices = data_shape[slicing_dim]

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions