-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Currently functions like ravel
and flatten
follow the same traversal that one would expect when using NumPy. This makes sense for users coming from NumPy, who would expect this. Also there are some Dask Array functions that rely on this behavior to match their NumPy counterparts. However this means that ravel
and flatten
must spread out the chunks (rechunk the array), which can have a non-negligible performance cost associated.
That said, not all cases rely on matching the same order that NumPy would provide when flattening out an array. For cases like this, a nice alternative would be to flatten chunks themselves and merely stitch together the flattened chunks into a new 1-D array. This strategy would require no rechunking and would be embarrassingly parallel. Thus it would avoid the performance penalties that ravel
and flatten
have today.
Assuming this strategy is reasonable, there are a few ways we could go about implementing it.
- Allow some additional options to the
order
parameter to handle this need. - Add a new parameter to toggle NumPy or chunk-based traversal strategies.
- Include a config option to enable this behavior for Dask Arrays more generally.
- Add a new function entirely for this behavior.
- ?
Thoughts on this?