New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update inlining Futures in task graph in Client._graph_to_futures #3303
Conversation
def fast_subs(o, d):
typ = type(o)
if typ is tuple and o and callable(o[0]):
return (o[0],) + tuple(fast_subs(i, d) for i in o[1:])
elif typ is list:
return [fast_subs(i, d) for i in o]
elif typ is dict:
return {k: fast_subs(v, d) for (k, v) in o.items()}
elif typ is str:
# I *believe* when you get to the point where you call this
# all keys will have already been converted to strings.
return d.get(o, o)
return o |
o: | ||
Core data structures containing literals and keys | ||
d: dict | ||
Mapping of keys to values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation these have to be str keys, may be worth noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm that's a really good point. I think we'll want to cover generic, non-string keys too (thinking of, for example, a persisted dask array which has keys like {("chunk", 0): <Future>}
).
Since d
is a mapping which contains keys in the task graph to substitute, we could first check whether or not o
is itself a key in d
. That would let us know to make a substitution when we come across a key like, for example, ("chunk", 0)
. I pushed a commit with what I mean in code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at this point all keys are strings already, but I may be wrong. I don't remember when keys are converted to strings (if it's on the client or the scheduler side), but at some point everything's a string, so the previous code may be fine. I was mostly commenting to update the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client converts keys to strings before sending to the scheduler. The scheduler only understands string-valued keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, I hadn't realized this string conversion took place. It looks like the the conversion happens here:
distributed/distributed/client.py
Line 2446 in f7a0d7a
dsk2 = str_graph({k: v[0] for k, v in d.items()}, extra_keys) |
a few lines after Future
s have been inlined in the graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just updated subs_multiple
with the improvements from @jcrist in #3303 (comment)
Alternatively, we could try moving the str_graph
call before substitutions take place. However, one advantage to the current subs_multiple
implementation is we could use it elsewhere (on generic keys) should the need arise
LGTM, merging. Thanks @jrbourbeau. |
This PR updates
Client._graph_to_futures
to usedistributed.utils_comm.pack_data
to inlineFuture
s instead ofdask.optimization.inline
heredistributed/distributed/client.py
Lines 2436 to 2440 in 4a8a4f3
E.g. transforming a task graph like:
into:
This is motivated by
pack_data
performing better thaninline
for large graphs (see benchmark below) and should, I think, perform the same substitutions.Example benchmark:
outputs (on my laptop)
cc @jcrist if you get a moment to look at this
xref dask/dask#5299