Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase latency overhead in stealing cost calculation #5390

merged 1 commit into from Oct 19, 2021


Copy link

@fjetter fjetter commented Oct 5, 2021

In my latest dive into stealing code I investigated some of the logs and saw a lot of ridiculous steal requests. Task durations of ~10ms and occupancy differences between thief and victim of ~100ms.

Not only do we not care for such a difference but the act of stealing is guaranteed to be more expensive than letting things be.

Stealing requires at least three network bounces (steal-request, steal-confirm, compute-task) which includes code serialization if successful. It almost impossible to do this in the currently hard coded 1ms. The 100ms I propose are likely too conservative but I don't think this is necessarily a bad thing for stealing. I don't have time for large scale tests but am very confident that this should by much higher than it is right now. Thoughts, concerns?

cc @gjoseph92 @crusaderky

Copy link
Member Author

fjetter commented Oct 5, 2021

fwiw, I don't even consider it worth it to measure this properly. We are working with so many estimations in the stealing code that an accurate measurement of this offset is not worth it imho

Copy link

Frankly 0.1s doesn't even seem that conservative to me.

@fjetter fjetter changed the title Increase latency for stealing Increase latency overhead in stealing cost calculation Oct 6, 2021
@crusaderky crusaderky merged commit a8151a6 into dask:main Oct 19, 2021
17 of 22 checks passed
madkinsz pushed a commit to madkinsz/distributed that referenced this pull request Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging this pull request may close these issues.

None yet

3 participants