Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swapping getitem and elementwise operations via graph opimization, redux? #2431

Open
shoyer opened this issue Jun 7, 2017 · 10 comments
Open

Comments

@shoyer
Copy link
Member

shoyer commented Jun 7, 2017

@jcrist did some work back in #755 on swapping getitem and elementwise operations. We didn't end up merging this work, because we were unable to come up with a solution that worked across the board without slowing down every dask array operations (by forcing broadcasting).

However, this is still a vital optimization for climate/weather data analysis use cases (where arrays are often currently loaded with one chunk/file, as described in pydata/xarray#1440), and would be extremely valuable for xarray (allowing us to remove our legacy system for deferred array computation). It would be great to pursue it in some more limited form.

Any of the following would solve xarray's use cases:

  1. Optimization that only works on elementwise operations with a single array argument, e.g., of the form x + 1.
  2. Optimization that only works on elementwise operations with pre-broadcast arguments. These would need some sort of special indication in the dask graph, likely via some custom API, e.g., da.swappable_elementwise().
  3. A new (optional) elementwise operation that always explicitly broadcasts, for cases when optimization is essential and some overhead is acceptable.

The optimization pass itself could also be optional, but we would always want to use it on dask.array objects wrapped with xarray.

@mrocklin
Copy link
Member

mrocklin commented Jun 7, 2017

We're certainly in a better place now that we can swap in and out optimizations. We can build optimizations that are not always-on.

@mrocklin
Copy link
Member

mrocklin commented Jun 7, 2017

I understand the first case of (x + 1)[...] -> x[...] + 1 however I don't understand either of the second two cases. Can you elaborate?

@mrocklin
Copy link
Member

mrocklin commented Jun 7, 2017

My first inclination on swapping elementwise operations is to do this in two steps:

  1. Fuse ufuncs together (I think that this will be useful in its own right)
    (add, (mul, x, y), 1) -> (ufunc_apply, add_mul, x, y, z)
  2. Swap getitems and these apply nodes.

Although I should look more closely at @jcrist 's past work to see what troubles arose.

@shoyer
Copy link
Member Author

shoyer commented Jun 7, 2017

Case two would be the optimization (x + y)[...] -> x[...] + y[...], only if x.shape == y.shape.

Case three is related, basically x + y -> x2 + y2 with x2, y2 = np.broadcast_arrays(x, y) (and then doing the case two optimization).

@mrocklin
Copy link
Member

mrocklin commented Jun 7, 2017

Given a ufunc and a subsequent getitem call my hope would be that rules to apply the index to the arguments would be straightforward.

For example given the following:

x = np.ones((10, 1))
y = np.ones((1, 10))

z = (x + y)[a, b]

We know that we can rewrite this as

z = (x[a] + y[b])

Correct?

I imagine that we can probably do the slicing prior to broadcasting if we are clever about it.

@shoyer
Copy link
Member Author

shoyer commented Jun 7, 2017

My understand of why @jcrist's past work broke down is that there wasn't a good way to do this optimization when the shapes of the arguments differed, and doing broadcasting in every case added too much overhead (~10% to every dask array operation, whether it needed this optimization or not). Hence my later two suggestions, to make this optional.

@shoyer
Copy link
Member Author

shoyer commented Jun 7, 2017

We know that we can rewrite this as
z = (x[a] + y[b])
Correct?

Yes, absolutely. But I'm not sure this is actually worth the trouble, as I haven't seen this specific sort of access pattern come up for geo-data. Often indexing can't be factorized over elementwise arguments -- either there's only one non-array or all arrays have the same shape.

The "broadcast first" solution could solve this nearly as well, with only some constant run-time overhead:

x2, y2 = broadcast_arrays(x, y)
z = (x2[a, b] + y2[a, b])

...though it would certainly help to defer automatic rechunking until after the optimization pass (which of course isn't really possible with the current dask.array design).

@mrocklin
Copy link
Member

mrocklin commented Jun 7, 2017

Yeah, sorry, to be clear the arrays I'm thinking about here are numpy arrays. I assume that this optimization would kick in only after generic task fusion and slicing-fusion had occurred so that we had large complex tasks that we then wanted to reorder.

Rechunking after expression-creation would require taking on something like Blaze (or other). Lets leave that as off-topic for now.

@mrocklin
Copy link
Member

mrocklin commented Jun 7, 2017

What costs does broadcast_arrays add? Does it perform an allocation or is everything handled with numpy metadata?

@shoyer
Copy link
Member Author

shoyer commented Jun 7, 2017

Does it perform an allocation or is everything handled with numpy metadata?

It's all done with metadata, effectively just inserting new dimensions with a stride of 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants