Increase latency overhead in stealing cost calculation #5390
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
In my latest dive into stealing code I investigated some of the logs and saw a lot of ridiculous steal requests. Task durations of ~10ms and occupancy differences between thief and victim of ~100ms.
Not only do we not care for such a difference but the act of stealing is guaranteed to be more expensive than letting things be.
Stealing requires at least three network bounces (steal-request, steal-confirm, compute-task) which includes code serialization if successful. It almost impossible to do this in the currently hard coded 1ms. The 100ms I propose are likely too conservative but I don't think this is necessarily a bad thing for stealing. I don't have time for large scale tests but am very confident that this should by much higher than it is right now. Thoughts, concerns?
cc @gjoseph92 @crusaderky