Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-2733: Better load aware shuffle implementation #2321

Merged
merged 1 commit into from
Sep 14, 2017

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented Sep 11, 2017

I have run several tests that show this works much better at big imbalances in processing latency than did the previous shuffle implementations.

I ran some simple performance tests and because chooseTasks didn't change the performance was more or less identical to what was here before.

The plan on how this would work with STORM-2686 (adding distance to shuffle) is that we would have 4 different weights (worker local, node local, rack local, and everywhere). Each time we update the load we update all of the weights, for the min load currently in the locality group. This is because executors may move from one group to another as things are rescheduled, so we need a way to keep it consistent.

We can then test a few different ways of selecting the group we want to target. Currently we are thinking we will select the most local group that has a maximum load < .5 falling back to everything if we cannot find one.

Copy link
Contributor

@kishorvpatil kishorvpatil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1

@asfgit asfgit merged commit b9f1d7e into apache:master Sep 14, 2017
asfgit pushed a commit that referenced this pull request Sep 14, 2017
… into STORM-2733

STORM-2733: Better load aware shuffle implementation

This closes #2321
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants