Skip to content
{{ message }}

# dask / dask

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# Add optimizations to dask.array to remap ufunc(data)[key] to ufunc(data[key]) #746

Open
opened this issue Sep 22, 2015 · 20 comments
Open

# Add optimizations to dask.array to remap ufunc(data)[key] to ufunc(data[key])#746

opened this issue Sep 22, 2015 · 20 comments

## Comments

### shoyer commented Sep 22, 2015

 This is a reminder to finish up/reboot the aborted PR #410 These optimizations would be extremely useful for xray's integration with dask (both internally and for our users), by allowing us to write transformations on entire dask arrays without any cost in unnecessary IO. Once this is working robustly, I'll be able to delete a pretty significant chunk of code from xray that deals with lazy computation. The text was updated successfully, but these errors were encountered:

### jcrist commented Sep 23, 2015

 I'm working on this. For single argument ufuncs, this is fairly simple. I'm uncertain how to handle functions with multiple arguments (or operators), as the slicing can't be pushed to all arguments due to broadcasting: ```In [87]: a = np.arange(27).reshape((3, 3, 3)) In [88]: b = np.arange(9).reshape((3, 3)) In [89]: (a + b)[0] Out[89]: array([[ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) In [90]: a[0] + b[0] Out[90]: array([[ 0, 2, 4], [ 3, 5, 7], [ 6, 8, 10]])``` Suggestions on how to properly handle this would be appreciated.

### shoyer commented Sep 23, 2015

 I think we should insert an explicit broadcasting step into `elemwise` using the dask equivalent of `np.broadcast_arrays`. We can build this up out of `broadcast_to`, though we'll need our own code to calculate the target shape. Then, we can simply index into each non-scalar argument, because they will all be the same shape.

### shoyer commented Sep 23, 2015

 It's possibly worth copying NumPy's unit tests from `_broadcast_shape` in `numpy.lib.stride_tricks`: https://github.com/numpy/numpy/blob/941a4e037d394dada43e3c2beef1e650d5505742/numpy/lib/tests/test_stride_tricks.py#L267 However, the current implementation only works on NumPy arrays because it uses the NumPy array iterator: https://github.com/numpy/numpy/blob/941a4e037d394dada43e3c2beef1e650d5505742/numpy/lib/stride_tricks.py#L115 There's some code at does this calculation in an old version of stride_tricks, but I suspect you could simply it considerably: https://github.com/numpy/numpy/blob/edb902cdc6573553afcf11047ecdfb447e444322/numpy/lib/stride_tricks.py#L81-L118

### jcrist commented Sep 23, 2015

 After experimenting with this for a bit, it seems that explicitly calling `broadcast_to` in the graph is 20% slower on simplistic cases (summing a 500x500x500 array with a 500x500 one, chunked by 250 in each dimension). I tried a few other cases, and saw similar numbers. This may be insignificant in real cases, I'm not certain. I suspect it will have an impact though, as elementwise computations are pretty common. Another option would be to special case the `getitem` instead, doing the broadcasting inside the selection after the rewrite. This wouldn't penalize normal operation, and would allow the rewriting you want. Downside is that it's `yet-another-get*-function`, and may not play nice with the other optimizations. I like this option better though. Thoughts?

### shoyer commented Sep 23, 2015

 20% slower at compute or compile time? On Wed, Sep 23, 2015 at 12:20 PM, Jim Crist notifications@github.com wrote: After experimenting with this for a bit, it seems that explicitly calling broadcast_to in the graph is 20% slower on simplistic cases (summing a 500x500x500 array with a 500x500 one, chunked by 250 in each dimension). I tried a few other cases, and saw similar numbers. This may be insignificant in real cases, I'm not certain. I suspect it will have an impact though, as elementwise computations are pretty common. Another option would be to special case the getitem instead, doing the broadcasting inside the selection after the rewrite. This wouldn't penalize normal operation, and would allow the rewriting you want. Downside is that it's yet-another-get*-function, and may not play nice with the other optimizations. I like this option better though. Thoughts? — Reply to this email directly or view it on GitHub #746 (comment).

### jcrist commented Sep 23, 2015

 ```In [13]: a = np.arange(500**3).reshape((500, 500, 500)) In [14]: b = np.arange(500**2).reshape((500, 500)) In [15]: t = da.from_array(a, chunks=(250, 250, 250)) In [16]: s = da.from_array(b, chunks=(250, 250)) In [17]: s2 = da.broadcast_to(s, t.shape) In [18]: o = t + s In [19]: o2 = t + s2 In [20]: %timeit o.compute() 1 loops, best of 3: 807 ms per loop In [21]: %timeit o2.compute() 1 loops, best of 3: 1.02 s per loop``` Again, toy computation, probably not representative of real world things.

### shoyer commented Sep 23, 2015

 Hmm. Might be worth a shot trying `getitem_from_broadcasted` then.

### jcrist commented Sep 23, 2015

 So I'm running into a problem - the following won't work with non-array objects (hdf5 files, etc...). ```def getitem_from_broadcasted(arg, index, outshape): return np.broadcast_to(arg, outshape)[index]``` For indexing/slicing operations, I can transform the index appropriately with a little extra logic to remove the need to call `broadcast_to`. I'm unsure if the same can be done for indexing with boolean or integer arrays.

### shoyer commented Sep 23, 2015

As long as you aren't doing vectorized fancy indexing stuff, you can treat each axis independently. Then indexing into a synthetically enlarged axis is pretty simple: just grab the approach number of elements (preferably using broadcast_to), since they will all be duplicated.

On Wed, Sep 23, 2015 at 4:19 PM, Jim Crist notifications@github.com
wrote:

So I'm running into a problem - the following won't work with non-array objects (hdf5 files, etc...).

```def getitem_from_broadcasted(arg, index, outshape):
return np.broadcast_to(arg, outshape)[index]```

## For indexing/slicing operations, I can transform the index appropriately with a little extra logic to remove the need to call `broadcast_to`. I'm unsure if the same can be done for indexing with boolean or integer arrays.

Reply to this email directly or view it on GitHub:
#746 (comment)

### jcrist commented Sep 24, 2015

 So would it be fine if this optimization pass only worked on indexes composed of slices, ellipsis, or single integers (don't perform the rewrite if not composed of these)?

### shoyer commented Sep 24, 2015

 Indexing with 1D arrays is also pretty important. I don't think that's much harder than doing slices. Slices, ellipsis and single integers would be a good start, though, and I can see if there's an obvious way to extend that. On Thu, Sep 24, 2015 at 9:56 AM, Jim Crist notifications@github.com wrote: So would it be fine if this optimization pass only worked on indexes composed of slices, ellipsis, or single integers? — Reply to this email directly or view it on GitHub #746 (comment).
mentioned this issue Sep 25, 2015
mentioned this issue Oct 2, 2015
This was referenced Aug 18, 2016
mentioned this issue Nov 10, 2016
mentioned this issue Apr 13, 2017
mentioned this issue Nov 17, 2017

### jakirkham commented Dec 4, 2017

 Have there been any more recent PRs on this topic since the one mentioned in the OP? What are the current issues still facing this issue?

### shoyer commented Dec 4, 2017

 Even the single argument version of this would be enough to solve xarray's lazy decoding issue, I think. (We would technically need binary arithmetic between an array and a scalar, but that can be mapped into the single argument version.)
mentioned this issue Jun 15, 2018
mentioned this issue Jul 18, 2018

### jakirkham commented Jul 20, 2018

 Do you have any advice, @jcrist, regarding the single argument version? Maybe a PR or write-up that might be a useful starting point?

### shoyer commented Jul 21, 2018

 Over in pydata/xarray#2298, @mrocklin mentioned #2538 as another way to solve this problem, which looks like a great potential solution for this problem. It seems like some of the geoscience/astronomy use-cases could really use the ability to perform this optimization on arrays that are functions of multiple arguments.

### jakirkham commented Oct 30, 2018

 Given the recent optimizations that went in, is this one still outstanding or has it been addressed?

### mrocklin commented Oct 30, 2018

 It has not been addressed … On Tue, Oct 30, 2018 at 3:18 AM jakirkham ***@***.***> wrote: Given the recent optimizations that went in, is this one still outstanding or has it been addressed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#746 (comment)>, or mute the thread .

### jakirkham commented Nov 5, 2018

 As an interesting related case, would submit `key = mask`.

### jakirkham commented Nov 16, 2018

 FWIW a colleague voiced interest in solving `map_blocks` cases with selection using a similar optimization. Expect the use case is not too different from the ones that motivated opening this issue in the first place.

### jakirkham commented Nov 19, 2018

 Should add `map_overlap` would be another case where this would be nice to solve. However given the operation is overlapping it's not really a `ufunc` and is probably different enough to deserve its own issue.
mentioned this issue Feb 27, 2020
to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants