New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scan / prefix sum primitive #277
Comments
Oops I think I missed this one when creating the |
Thanks @dcherian - great suggestion. It would be interesting to see how we could implement this in Cubed. The Python Array API spec has a proposal for Also, I noticed that |
I think the relevant part of the Nvidia doc is "39.2.4 Arrays of Arbitrary Size", which explains how to apply the algorithm to chunked (or blocked) arrays. We could implement this by using the NumPy Naively, Instead we could write |
Ah yes that figure in "39.2.4" is what I remember. Such a cool algorithm! Here's the dask PR where I learnt of this: dask/dask#6675 |
If you're looking for something to do :), then scans would be a good thing to add.
Dask calls this "cumreduction" (terrible name!) : and its a quite useful primitive (xarray uses it for
ffill
,bfill
). It's also a fun algorithm to think about: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda see the blelloch, 1990 section)The text was updated successfully, but these errors were encountered: