Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid multiple data connections to the same worker #790

Merged
merged 2 commits into from Jan 5, 2017

Conversation

mrocklin
Copy link
Member

@mrocklin mrocklin commented Jan 5, 2017

The workers manage multiple concurrent peer-to-peer connections. Each connection can gather multiple pieces of data from that worker Workers limit the number of simultaneously open connections at once to avoid spreading network bandwidth too thinly. They avoid gathering more than 200MB of extra data from a worker at once in order to keep high priority tasks from waiting on data of low priority tasks.

However, previously they might open several connections to the same worker, each collecting 200MB. This can overwhelm the connection, resulting in slower overall performance.

Now we restrict workers to only have one connection to a particular peer at a time. They can still have many connections open, but they must be to different peers.

We document worker state in the docstring.  This helped to identify
state that was no longer in use.
@mrocklin mrocklin merged commit e83fbf5 into dask:master Jan 5, 2017
@mrocklin mrocklin deleted the communication-overlap branch January 5, 2017 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant