Add scan / prefix sum primitive #277

dcherian · 2023-07-24T18:31:48Z

If you're looking for something to do :), then scans would be a good thing to add.

Dask calls this "cumreduction" (terrible name!) : and its a quite useful primitive (xarray uses it for ffill, bfill). It's also a fun algorithm to think about: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda see the blelloch, 1990 section)

The text was updated successfully, but these errors were encountered:

TomNicholas · 2023-07-24T20:36:54Z

Oops I think I missed this one when creating the ChunkManagerEntrypoint in pydata/xarray#7019. Looks like calling xarray's ffill or bfill on an array with chunked core dimensions will attempt to apply dask.array.cumreduction along the core dims even if the underlying array is a cubed.Array.

tomwhite · 2023-07-25T09:56:48Z

Thanks @dcherian - great suggestion. It would be interesting to see how we could implement this in Cubed.

The Python Array API spec has a proposal for cumulative_sum, which is a special case of this operation.

Also, I noticed that bfill in Xarray uses flip twice, so #114 may be relevant (although it might be more efficient to avoid flip in this case, due to Zarr's regular chunks requirement - flip will violate this since the final chunk will become the first chunk, so the whole array needs rewriting).

tomwhite · 2023-07-25T10:29:15Z

I think the relevant part of the Nvidia doc is "39.2.4 Arrays of Arbitrary Size", which explains how to apply the algorithm to chunked (or blocked) arrays.

We could implement this by using the NumPy cumsum operation (or equivalent for the generalised operation) on the blocks, and then create the auxiliary arrays of sums (SUMS using the Nvidia terminology) and cumulative sums (INCR).

Naively, INCR could have chunks of size one so that it could be mapped using a blockwise operation with the scanned blocks, but this would not scale well since if there were say 1000 chunks, then a single task would have to write 1000 tiny chunk files when storing INCR as a Zarr file.

Instead we could write INCR as a Zarr file with a single chunk, then a task processing a scanned block i would load the entire INCR file, but just use the i-th value to add to its chunk. This should work as long as the object store could handle many concurrent reads. Cubed has a map_direct operation that allows access to side inputs (INCR) for cases like this.

dcherian · 2023-07-25T15:06:06Z

Ah yes that figure in "39.2.4" is what I remember. Such a cool algorithm!

Here's the dask PR where I learnt of this: dask/dask#6675

tomwhite added array api core labels Jul 25, 2023

TomNicholas mentioned this issue Jul 25, 2023

Generalize cumulative reduction (scan) to non-dask types pydata/xarray#8019

Merged

4 tasks

tomwhite mentioned this issue Mar 27, 2024

Array API 2023 tracking issue #438

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scan / prefix sum primitive #277

Add scan / prefix sum primitive #277

dcherian commented Jul 24, 2023

TomNicholas commented Jul 24, 2023 •

edited

tomwhite commented Jul 25, 2023

tomwhite commented Jul 25, 2023

dcherian commented Jul 25, 2023

Add scan / prefix sum primitive #277

Add scan / prefix sum primitive #277

Comments

dcherian commented Jul 24, 2023

TomNicholas commented Jul 24, 2023 • edited

tomwhite commented Jul 25, 2023

tomwhite commented Jul 25, 2023

dcherian commented Jul 25, 2023

TomNicholas commented Jul 24, 2023 •

edited