New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dask.array.jit or dask.array.vectorize #1946
Comments
I feel your pain. I'm not entirely sure what you're actually proposing, though. Would it not suffice to simply insert |
For what it's worth, I would expect |
These functions are fed dask.arrays, so def linear_jit(func):
jit_func = jit(func)
def _(*args, **kwargs):
return map_blocks(jit_func, *args, **kwargs)
return _ |
I think that concretely I'm proposing two things:
|
Seems totally reasonable to me. Numba should remain an optional dependency, of course.
No real objection here, either, though I would consider skipping straight to a programmatic dict for |
First, a word of warning. Second: in the functions mentioned in the original post, numba/cython optimization may produce some improvement - but in my experience from very similarly-looking functions, not enough to bother as long as your chunk size is large enough to prevent the GIL from kicking in. YMMV. The big improvement comes when you simply wrap those expressions in a single pure numpy function (lamdba or not),which is then called atomically by dask. Hence the biggest problem as of today - IMHO, dask.array.map_blocks could really use some love as it's unusable e.g. whenever the inputs have mismatched dimensions. |
Thanks for the experience report @crusaderky . Some questions:
Can you expand on this?
Yeah, I actually have the reverse perspective, that map_blocks shouldn't attempt to be any more complex than it currently is. It has been accumulating features and corner cases for a while now. Would love for a different operation though if people have recommendations and time to build something. I tend to recommend |
where all inputs are xarray variables
It's worth noting that in this formula I don't have any diamond dependencies; all dask-backed variables are the result of semi-independent branches of computation. This is unlike in your l1() function, where x appears 3 times. The improvement you achieve by wrapping the whole thing in numpy is that you're reducing by a large factor the number of keys in your final dask dict - hence, all the non-parallelisable work needed to resolve the problem. |
Instead of Numba we could also generate more complex functions that use out parameters in numpy functions. This is probably more work to generate but would be lower-tech (good). I'm still somewhat in favor of numba though. It seems easier to try out at least. This may be because I haven't been bitten by it in the same way that @crusaderky has. On the plus side, I suspect that numba devs would respond quickly to any issues we push upstream. |
Regardless of the numpy/numba choice, another benefit to rewriting fused tasks is serialization time. As we contract the number of tasks with optimizations like #1979 we reduce per-task overhead in all ways except serialization. The large compound tasks that we generate like @jcrist, this is starting to look like
Do you have any interest? This would align nicely with @eriknw 's current work on fusion. |
Hrm, I wonder if we can short-circuit some of the rewrite rule cost by using the fact that the key prefixes are likely to be the same. |
I'm skeptical of generating python code (and using numba) here, and would rather create an object to "interpret" these tasks, using the
In [1]: from operator import add, mul
In [2]: task = (add, (mul, 'x', (add, 1, 'y')), 2)
In [3]: import cloudpickle
In [4]: len(cloudpickle.dumps(task))
Out[4]: 68
In [5]: s = """
...: def func(x, y):
...: return ((1 + y) * x) + 2
...: """
In [6]: namespace = {'add': add, 'mul': mul}
In [7]: exec(s, namespace)
In [8]: func = namespace['func']
In [9]: len(cloudpickle.dumps(func))
Out[9]: 326
I'm not 100% against generating code here, but I'd prefer to try a simpler implementation first. Either way, this is something I'd be interested in working on, I just need to find the time :). |
Yeah, as long as things are of the form We also compress data between client and scheduler if it becomes large. I haven't seen moving large repetitive bytestrings become a bottleneck yet. |
For the issue of serialization cost, you could serialize the function in its string form and then numba-compile it on the worker. |
@pitrou if you have any interest, the application I'm trying to improve is here: dask/dask-glm#26 . The first comment points to this single-threaded benchmark that is, I think, representative of the performance issue. |
@jcrist I don't think my problem was compilation. I had cache=True and my
signature never changes...
…On 14 Feb 2017 17:42, "Matthew Rocklin" ***@***.***> wrote:
@pitrou <https://github.com/pitrou> if you have any interest, the
application I'm trying to improve is here: dask/dask-glm#26
<dask/dask-glm#26> . The first comment points
to this single-threaded benchmark
<https://gist.github.com/mrocklin/48b7c4b610db63b2ee816bd387b5a328> that
is, I think, representative of the performance issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1946 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AF7OMOm_-_VXt2ZPjr1aB_L4VtwTx0Gjks5rceeDgaJpZM4Lx-VH>
.
|
FWIU I really like this idea. Could you elaborate on it a bit more @mrocklin? |
I could, but what I would say has probably already been said in this issue. So that I don't waste a lot of time writing down all of my thoughts on this perhaps you can focus things by asking what in particular you are curious about? |
Are you proposing caching graphs based on having the same inputs? Or are you proposing caching that is somehow able to stub in different inputs into the same graph structure? The former seems straightforward and potentially a nice add for a few cases using some preferred caching decorator. With the latter, I'm a little unclear on how this might work. |
I am curious about this case.
Yes. Me too. |
What are your thoughts on this topic @jakirkham ? |
Not sure. Haven't thought about it long enough to have any ideas. That said, this does sound kind of like a LISP style problem. It might be worth taking a look at Hy and seeing if they are any libraries or tools that could be used or extended for this case. |
I don't think that solving this problem will require that level of technology. |
OK, so I think it would be useful to replace many of the keys in a graph with a new set of keys. >>> dsk = {'x': 1, 'y': (inc, 'x'), 'z': (add, 'x', 'y')}
>>> swap = {'x': 'a', 'y': 'b', 'z': 'c'}
>>> replace(dsk, swap)
{'a': 1, 'b': (inc, 'a'), 'z': (add, 'a', 'b')} I'm curious if/how we can do this quickly cc @eriknw @jcrist . A simple example motivating this operation follows: ExampleLets say that I have dask object with the following graph and keys: >>> original.dask
{'orig-1': 1,
'orig-2': (inc, 'orig-1')}
>>> orig._keys()
['orig-2'] I want to run a function >>> out = f(orig)
>>> out.dask
{'orig-1': 1,
'orig-2': (inc, 'orig-1'),
'out-1', (inc, 2),
'out-2', (add, 'out-1', 'orig-1')} I also want to apply this same function onto a new object new with the following graph: >>> new.dask
{'new-1': (g,)}
>>> new._keys()
['new-1'] However my function So I want to do three things:
Both parts two and three ask for a function that efficiently swaps out key names. I'm curious how fast this can be. The whole purpose of this exercise is to avoid graph construction and optimization costs. |
I just saw this—and I love the concepts. Would be happy to help flesh them out. I happened upon this page by googling "dask monte carlo" by the way. |
It would be great to hear your thoughts on this, @mcg1969. :) |
This will likely be handled by the current effort on high level expression graphs. Closing. |
When working with dask-glm I find myself interacting with functions like the following (where
x
is a dask.array):These are costly in a few ways:
So there are two part solutions that we could combine here:
Using numba would actually be pretty valuable in some cases in dask-glm. This could be an optimization at the task graph level. I suspect that if we get good at recognizing recurring patterns and cache well that we could make this fast-ish.
(add, _, (mul, _, _)) -> numba.jit(lambda x, y, z: x + y * z)
. We might also be able to back out patterns based on keys (not sure if this is safe)cc @jcrist @eriknw @sklam @seibert @shoyer
The text was updated successfully, but these errors were encountered: