You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
but the from_array method has the fancy argument, suggesting that there's a way to implement a bare minimum of array slicing in the backend and let dask do the rest of the work.
What is that bare minimum?
dask.array.slicing seems to have utilities to fill out a lot of possibilities; my hope is that the backend for an N-D array would only have to support an N-length tuple of slice objects with >=0 start and stop, step=None.
Context: h5py_like attempts to provide base classes and utilities for libraries implementing ndarray storage (like pyn5) to make them behave like h5py objects, including somearray indexing cases and rudimentary concurrent IO. I'd love to offload both responsibilities to dask's utilities, as it's much more mature, better tested, and has more eyes on it.
Same question for writing with dask.array.store.
The text was updated successfully, but these errors were encountered:
I think this means point selection (like a[0, 5]), contiguous slicing (like a[2:5]), non-contiguous slicing (like a[2:5:2]), and reversed slicing (like a[2:5:-1]). If there is a place where documentation might help, feel free to submit a PR with some useful text where you were looking for this info. Would be happy to review 🙂
Fancy indexing is more about selecting multiple points in different ways (inner indexing - just some points, outer indexing - grid of points with self-chosen spacing). Also supporting things like bools arrays for selection. I don't know if you use these at all. If not, I wouldn't worry about it.
It's worth noting that libraries like h5py, Zarr, etc. will take the first selection and then return a NumPy array. At which point everything works as expected (including fancy indexing). So this really only matters if you are trying to do fancy indexing as part of reading in data.
My guess is this won't matter much. I've yet to be bitten by a case where Dask tries to use fancy indexing on an object that doesn't support it. So probably worth just giving things a try and seeing if it works out or not. Happy to follow-up if you run into issues.
Same thing with da.store. I wouldn't worry about more sophisticated forms of indexing when saving data unless you are performing some more complex transform when writing out the data. If you are just doing a 1-to-1 mapping of Dask chunks to some slice in the stored array, then only basic slicing will be relevant.
To an extent. I think it could be specified in the docs exactly what features are needed for each; I'll try to raise a PR to that effect when I have time. One of these days I'll write a test suite for array-likes to ascertain exactly which numpy features any given one supports...
From https://docs.dask.org/en/latest/array-creation.html#create-dask-arrays
but the from_array method has the
fancy
argument, suggesting that there's a way to implement a bare minimum of array slicing in the backend and let dask do the rest of the work.What is that bare minimum?
dask.array.slicing
seems to have utilities to fill out a lot of possibilities; my hope is that the backend for anN
-D array would only have to support anN
-length tuple ofslice
objects with >=0start
andstop
,step=None
.Context: h5py_like attempts to provide base classes and utilities for libraries implementing ndarray storage (like pyn5) to make them behave like h5py objects, including some array indexing cases and rudimentary concurrent IO. I'd love to offload both responsibilities to dask's utilities, as it's much more mature, better tested, and has more eyes on it.
Same question for writing with
dask.array.store
.The text was updated successfully, but these errors were encountered: