Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative scheduling for new tasks #2940

Closed
wants to merge 1 commit into from

Conversation

TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Aug 7, 2019

Rather than placing new tasks with no dependencies on the first
idle worker, we try placing them on a worker executing tasks they're
a co-dependency with. This helps to reduce memory usage of graphs like

        a-1   a-2  a-3   a-4  a-5  a-6
          \    |    /     \    |    /
              b-1             b-2

This is meant to address #2602. Will require some testing

I'm writing up a bunch of benchmarks on synthetic workloads now. Will try out on some real workloads as well.

Rather than placing new tasks with no dependencies on the first
idle worker, we try placing them on a worker executing tasks they're
a co-depenency with.
# If time weren't an issue, we might find the worker with the
# most siblings. But that's expensive.
#
for sts in dts.dependencies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several situations where a single task has very many dependents. In these cases I think that we'll hit N^2 scaling and bring things down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cases where we don't have siblings, but cousins n'th removed

a1
|
a2    b1
 \   /
   c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my initial got a full count of where each of our co-dependencies was running. That blew up very quickly. The early break once we find a co-dependency was a first attempt to avoid that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach won't help in that case (I think a-1 and b-1 are niblings 😄).


a-1 a-2 a-3 a-4
\ / \ /
b-1 b-2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all ascii art diagrams in the codebase so far have computation going from bottom to top. This is also the way that visualize works.

@mrocklin
Copy link
Member

mrocklin commented Aug 7, 2019

A long while ago we used to schedule things differently if they call came in in the same batch. We wouldn't do things one by one, we would take all of the initially free tasks, sort them by their dask.order value, and then partition them among the workers in that order. This worked well because nodes that have similar ordering values are likely to be closely related.

However this works poorly if...

  • There are many leaves and our sorting algorithm is O(nlogn)
  • New tasks come in that aren't part of this
  • These tasks actually all have a single dependency
  • The workers have some pre-existing load that we want to take into account
  • Our graph isn't just a big collections computation, but the ordering between tasks is really important, and so we want to highly prioritize the tasks that ordering suggests we prioritize by moving them to all of the workers rather than on one.

@mrocklin
Copy link
Member

mrocklin commented Aug 7, 2019

As an aside, a common cause of the graphs that you're dealing with come from, I think, not doing high-level-graph fusion aggressively enough. I think that if we had data ingestion operations fused as we currently fuse blockwise that this situation would occur much less frequently. This is a less general solution, but handling it well would be an unambiguous benefit, while core scheduling always has tradeoffs.

I don't know the exact operation that you're trying to deal with, but it might be better handled by bringing operations like read_parquet, from_array, and others, under the Blockwise banner.

@TomAugspurger
Copy link
Member Author

A long while ago we used to schedule things differently if they call came in in the same batch. We wouldn't do things one by one, we would take all of the initially free tasks, sort them by their dask.order value, and then partition them among the workers in that order.

That's interesting to hear. I briefly looked into trying to fix things earlier on since it's so hard to satisfy the "schedule co-dependencies together" goal this late in the scheduling process (at the single-task level). I didn't explore it much, since it seems to go against how things are done currently.

I think that if we had data ingestion operations fused as we currently fuse blockwise that this situation would occur much less frequently.

Does too aggressive of fusion have a negative impact when you have multiple threads per worker? e.g. with

             b-1                b-2
          /  / \  \          /  / \  \
         /  |   |  \        /  |   |  \
       a-1 a-2 a-3 a-4    a-5 a-6 a-7 a-8

we might want to ensure that a-1 through a-4 end up on the same machine, but we might not want to fuse them.


I'll look into blockwise a bit. Perhaps updating Xarray's open_mfdataset to use it would yield some improvements.

@mrocklin
Copy link
Member

mrocklin commented Aug 7, 2019

Does too aggressive of fusion have a negative impact when you have multiple threads per worker? e.g. with

Maybe, but it's not common with collections (which is where we'll get high level blockwise fusion), where we commonly have far more partitions than we have threads.

I'll look into blockwise a bit. Perhaps updating Xarray's open_mfdataset to use it would yield some improvements.

The challenge is that blockwise currently expects to operate on Dask collections. There isn't a clean way of using it to start up a new graph.

@TomAugspurger
Copy link
Member Author

I'm not actively working on this at the moment. Closing to clear the backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants