Docs: Exactly which numpy slicing features does an array-like need to support to be used with Dask.from_array #5281

clbarnes · 2019-08-15T15:39:42Z

From https://docs.dask.org/en/latest/array-creation.html#create-dask-arrays

any format that supports NumPy-style slicing

but the from_array method has the fancy argument, suggesting that there's a way to implement a bare minimum of array slicing in the backend and let dask do the rest of the work.

What is that bare minimum?

dask.array.slicing seems to have utilities to fill out a lot of possibilities; my hope is that the backend for an N-D array would only have to support an N-length tuple of slice objects with >=0 start and stop, step=None.

Context: h5py_like attempts to provide base classes and utilities for libraries implementing ndarray storage (like pyn5) to make them behave like h5py objects, including some array indexing cases and rudimentary concurrent IO. I'd love to offload both responsibilities to dask's utilities, as it's much more mature, better tested, and has more eyes on it.

Same question for writing with dask.array.store.

The text was updated successfully, but these errors were encountered:

jakirkham · 2019-08-18T17:24:26Z

I think this means point selection (like a[0, 5]), contiguous slicing (like a[2:5]), non-contiguous slicing (like a[2:5:2]), and reversed slicing (like a[2:5:-1]). If there is a place where documentation might help, feel free to submit a PR with some useful text where you were looking for this info. Would be happy to review 🙂

Fancy indexing is more about selecting multiple points in different ways (inner indexing - just some points, outer indexing - grid of points with self-chosen spacing). Also supporting things like bools arrays for selection. I don't know if you use these at all. If not, I wouldn't worry about it.

It's worth noting that libraries like h5py, Zarr, etc. will take the first selection and then return a NumPy array. At which point everything works as expected (including fancy indexing). So this really only matters if you are trying to do fancy indexing as part of reading in data.

My guess is this won't matter much. I've yet to be bitten by a case where Dask tries to use fancy indexing on an object that doesn't support it. So probably worth just giving things a try and seeing if it works out or not. Happy to follow-up if you run into issues.

jakirkham · 2019-08-18T17:27:42Z

Same thing with da.store. I wouldn't worry about more sophisticated forms of indexing when saving data unless you are performing some more complex transform when writing out the data. If you are just doing a 1-to-1 mapping of Dask chunks to some slice in the stored array, then only basic slicing will be relevant.

jakirkham · 2019-08-20T13:55:10Z

Does that answer your questions @clbarnes?

clbarnes · 2019-08-20T14:19:02Z

To an extent. I think it could be specified in the docs exactly what features are needed for each; I'll try to raise a PR to that effect when I have time. One of these days I'll write a test suite for array-likes to ascertain exactly which numpy features any given one supports...

jakirkham added the array label Aug 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: Exactly which numpy slicing features does an array-like need to support to be used with Dask.from_array #5281

Docs: Exactly which numpy slicing features does an array-like need to support to be used with Dask.from_array #5281

clbarnes commented Aug 15, 2019 •

edited

jakirkham commented Aug 18, 2019

jakirkham commented Aug 18, 2019

jakirkham commented Aug 20, 2019

clbarnes commented Aug 20, 2019

Docs: Exactly which numpy slicing features does an array-like need to support to be used with Dask.from_array #5281

Docs: Exactly which numpy slicing features does an array-like need to support to be used with Dask.from_array #5281

Comments

clbarnes commented Aug 15, 2019 • edited

jakirkham commented Aug 18, 2019

jakirkham commented Aug 18, 2019

jakirkham commented Aug 20, 2019

clbarnes commented Aug 20, 2019

clbarnes commented Aug 15, 2019 •

edited