STORM-2733: Better load aware shuffle implementation #2321
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have run several tests that show this works much better at big imbalances in processing latency than did the previous shuffle implementations.
I ran some simple performance tests and because
chooseTasks
didn't change the performance was more or less identical to what was here before.The plan on how this would work with STORM-2686 (adding distance to shuffle) is that we would have 4 different weights (worker local, node local, rack local, and everywhere). Each time we update the load we update all of the weights, for the min load currently in the locality group. This is because executors may move from one group to another as things are rescheduled, so we need a way to keep it consistent.
We can then test a few different ways of selecting the group we want to target. Currently we are thinking we will select the most local group that has a maximum load < .5 falling back to everything if we cannot find one.