Skip to content

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

@yousefmoazzam

Description

@yousefmoazzam

Firstly, it's worth pointing out that being able to fit an entire chunk into GPU memory is not that common, so this issue is not very relevant to processing big data - it was discovered when running on the small test data that easily fits into GPU memory.

On commit 930e0da in #446, running tomo_standard.nxs with the pipeline_gpu1.yaml pipeline one can see that the small test data gets split into two blocks (whereas if padding is switched off for remove_outlier in the associated methods database YAML file there is only one block):

(base) root@492a2a3538d1:/httomo# python -m httomo run tests/test_data/tomo_standard.nxs tests/samples/pipeline_template_examples/pipeline_gpu1.yaml output_dir/
Pipeline has been separated into 2 sections
See the full log file at: output_dir/19-09-2024_09_44_59_output/user.log
Running loader (pattern=projection): standard_tomo...
    Finished loader: standard_tomo (httomo) Took 37.68ms
Section 0 (pattern=projection) with the following methods:
    data_reducer (httomolib)
    find_center_vo (httomolibgpu)
    remove_outlier (httomolibgpu)
    normalize (httomolibgpu)
     0%|          | 0/2 [00:00<?, ?block/s]
    50%|#####     | 1/2 [00:00<00:00,  1.29block/s]
    --->The center of rotation is 79.5
    Finished processing last block
Section 1 (pattern=sinogram) with the following methods:
    remove_stripe_based_sorting (httomolibgpu)
    FBP (httomolibgpu)
    save_intermediate_data (httomo)
    save_to_images (httomolib)
     0%|          | 0/1 [00:00<?, ?block/s]
    Finished processing last block
Pipeline finished. Took 1.937s

This is due to how the max slices is calculated. The absolute maximum that the max slices can be (at the start of the function, before being potentially whittled down by the different methods in a section) is based on the chunk_shape of the data source:

data_shape = self.source.chunk_shape
max_slices = data_shape[slicing_dim]

The chunk_shape property on any implementor of DataSetSource does not include padding. Therefore, the absolute max slices that is started with is the length of the chunk shape's slicing dim unpadded. Therefore, even if the GPU could fit:

  • all slices
  • plus, the necessary padding slices

the determine_max_slices() method only can report that max slices is "all slices without padding slices".

In the context of the test data, the max slices calculated is 180 slices (all projections). But, execution of the first section needs 2 padding slices added, so there are 182 slices to process. Because the max slices is only 180 and not 182 (even though the GPU can fit 182 slices in memory), this forces the chunk to be split into 2 blocks.

In order to fix this, I think the determine_max_slices() logic needs to account for required padding slices, to handle the case when the max slice + padding slices could also fit into GPU memory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingframeworkData-handling framework related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions