Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle tasks on hosts with overutilized memory resources #1937

Merged
merged 21 commits into from
May 22, 2019

Conversation

baconmania
Copy link
Contributor

No description provided.

@baconmania baconmania changed the title (WIP) Shuffle tasks on hosts with overutilized memory resources Shuffle tasks on hosts with overutilized memory resources May 17, 2019
Copy link
Member

@ssalinas ssalinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall strategy of 'fix the worst one first' seems fine to me for memory vs cpu. Left one comment about breaking out of the loop early if we think the current removed set of tasks should satisfy things. Other than that, my only other thought is if 'most overused' is still the best metric for sorting tasks to shuffle. We've seen some cases where a task using 10% of its cpu still gets shuffled, which seems odd. Though I'm not sure fo a good way around it either

private boolean shuffleTasksForOverloadedSlaves = false; // recommended 'true' when oversubscribing cpu for larger clusters
private boolean shuffleTasksForOverloadedSlaves = false; // recommended 'true' when oversubscribing resources for larger clusters

private double shuffleTasksWhenSlaveMemoryUtilizationPercentageExceeds = 0.82;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there any math behind this value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of an arbitrary rule of thumb. Assuming we'd like to set a default "target" memory utilization of 85%, setting this config at 82% should give us enough time to shuffle tasks before actually hitting our target.


for (TaskIdWithUsage taskIdWithUsage : possibleTasksToShuffle) {
if (requestsWithShuffledTasks.contains(taskIdWithUsage.getTaskId().getRequestId())) {
LOG.debug("Request {} already has a shuffling task, skipping", taskIdWithUsage.getTaskId().getRequestId());
continue;
}
if (cpuOverage <= 0 || shuffledTasksOnSlave > configuration.getMaxTasksToShufflePerHost() || currentShuffleCleanupsTotal >= configuration.getMaxTasksToShuffleTotal()) {
LOG.debug("Not shuffling any more tasks (overage: {}, shuffledOnHost: {}, totalShuffleCleanups: {})", cpuOverage, shuffledTasksOnSlave, currentShuffleCleanupsTotal);
if ((mostOverusedResource.overusage <= 0) || shuffledTasksOnSlave > configuration.getMaxTasksToShufflePerHost() || currentShuffleCleanupsTotal >= configuration.getMaxTasksToShuffleTotal()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpuOverage <= 0 worked as a condition for 'should this put us back in green?' because we updated it at the end of the loop. I don't currently see mostOverusedResource.overusage getting updated anywhere

@ssalinas
Copy link
Member

🚢

@ssalinas
Copy link
Member

🚢

@ssalinas ssalinas merged commit c832f48 into master May 22, 2019
@ssalinas ssalinas deleted the shuffle-tasks-for-memory branch May 22, 2019 18:52
@ssalinas ssalinas added this to the 0.23.0 milestone Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants