Swapping getitem and elementwise operations via graph opimization, redux? #2431

shoyer · 2017-06-07T16:10:55Z

@jcrist did some work back in #755 on swapping getitem and elementwise operations. We didn't end up merging this work, because we were unable to come up with a solution that worked across the board without slowing down every dask array operations (by forcing broadcasting).

However, this is still a vital optimization for climate/weather data analysis use cases (where arrays are often currently loaded with one chunk/file, as described in pydata/xarray#1440), and would be extremely valuable for xarray (allowing us to remove our legacy system for deferred array computation). It would be great to pursue it in some more limited form.

Any of the following would solve xarray's use cases:

Optimization that only works on elementwise operations with a single array argument, e.g., of the form x + 1.
Optimization that only works on elementwise operations with pre-broadcast arguments. These would need some sort of special indication in the dask graph, likely via some custom API, e.g., da.swappable_elementwise().
A new (optional) elementwise operation that always explicitly broadcasts, for cases when optimization is essential and some overhead is acceptable.

The optimization pass itself could also be optional, but we would always want to use it on dask.array objects wrapped with xarray.

The text was updated successfully, but these errors were encountered:

mrocklin · 2017-06-07T16:37:52Z

We're certainly in a better place now that we can swap in and out optimizations. We can build optimizations that are not always-on.

mrocklin · 2017-06-07T16:49:24Z

I understand the first case of (x + 1)[...] -> x[...] + 1 however I don't understand either of the second two cases. Can you elaborate?

mrocklin · 2017-06-07T16:59:04Z

My first inclination on swapping elementwise operations is to do this in two steps:

Fuse ufuncs together (I think that this will be useful in its own right)
```
(add, (mul, x, y), 1) -> (ufunc_apply, add_mul, x, y, z)
```
Swap getitems and these apply nodes.

Although I should look more closely at @jcrist 's past work to see what troubles arose.

shoyer · 2017-06-07T16:59:22Z

Case two would be the optimization (x + y)[...] -> x[...] + y[...], only if x.shape == y.shape.

Case three is related, basically x + y -> x2 + y2 with x2, y2 = np.broadcast_arrays(x, y) (and then doing the case two optimization).

mrocklin · 2017-06-07T17:02:49Z

Given a ufunc and a subsequent getitem call my hope would be that rules to apply the index to the arguments would be straightforward.

For example given the following:

x = np.ones((10, 1))
y = np.ones((1, 10))

z = (x + y)[a, b]

We know that we can rewrite this as

z = (x[a] + y[b])

Correct?

I imagine that we can probably do the slicing prior to broadcasting if we are clever about it.

shoyer · 2017-06-07T17:04:06Z

My understand of why @jcrist's past work broke down is that there wasn't a good way to do this optimization when the shapes of the arguments differed, and doing broadcasting in every case added too much overhead (~10% to every dask array operation, whether it needed this optimization or not). Hence my later two suggestions, to make this optional.

shoyer · 2017-06-07T17:16:47Z

We know that we can rewrite this as
z = (x[a] + y[b])
Correct?

Yes, absolutely. But I'm not sure this is actually worth the trouble, as I haven't seen this specific sort of access pattern come up for geo-data. Often indexing can't be factorized over elementwise arguments -- either there's only one non-array or all arrays have the same shape.

The "broadcast first" solution could solve this nearly as well, with only some constant run-time overhead:

x2, y2 = broadcast_arrays(x, y)
z = (x2[a, b] + y2[a, b])

...though it would certainly help to defer automatic rechunking until after the optimization pass (which of course isn't really possible with the current dask.array design).

mrocklin · 2017-06-07T17:19:57Z

Yeah, sorry, to be clear the arrays I'm thinking about here are numpy arrays. I assume that this optimization would kick in only after generic task fusion and slicing-fusion had occurred so that we had large complex tasks that we then wanted to reorder.

Rechunking after expression-creation would require taking on something like Blaze (or other). Lets leave that as off-topic for now.

mrocklin · 2017-06-07T17:21:36Z

What costs does broadcast_arrays add? Does it perform an allocation or is everything handled with numpy metadata?

shoyer · 2017-06-07T17:26:45Z

Does it perform an allocation or is everything handled with numpy metadata?

It's all done with metadata, effectively just inserting new dimensions with a stride of 0.

jcrist mentioned this issue Jul 24, 2017

Fuse array elementwise operations at graph build time #2538

Closed

dcherian mentioned this issue Dec 8, 2023

Integrate Dask Arrays properly dask/dask-expr#446

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swapping getitem and elementwise operations via graph opimization, redux? #2431

Swapping getitem and elementwise operations via graph opimization, redux? #2431

shoyer commented Jun 7, 2017

mrocklin commented Jun 7, 2017

mrocklin commented Jun 7, 2017

mrocklin commented Jun 7, 2017

shoyer commented Jun 7, 2017

mrocklin commented Jun 7, 2017

shoyer commented Jun 7, 2017

shoyer commented Jun 7, 2017

mrocklin commented Jun 7, 2017

mrocklin commented Jun 7, 2017

shoyer commented Jun 7, 2017

Swapping getitem and elementwise operations via graph opimization, redux? #2431

Swapping getitem and elementwise operations via graph opimization, redux? #2431

Comments

shoyer commented Jun 7, 2017

mrocklin commented Jun 7, 2017

mrocklin commented Jun 7, 2017

mrocklin commented Jun 7, 2017

shoyer commented Jun 7, 2017

mrocklin commented Jun 7, 2017

shoyer commented Jun 7, 2017

shoyer commented Jun 7, 2017

mrocklin commented Jun 7, 2017

mrocklin commented Jun 7, 2017

shoyer commented Jun 7, 2017