Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine offers to schedule tasks more efficiently #1561

Merged
merged 18 commits into from
Jun 23, 2017
Merged

Conversation

ssalinas
Copy link
Member

@ssalinas ssalinas commented Jun 7, 2017

This updates the scheduler to group offers by host and use the total resources available on all offers when scheduling tasks. Especially when using the offer cache, it has been common for us to have 10+ offers in the cache at a time from a single host. For tasks that require larger amounts of resources, this should get them scheduled much more quickly.

As far as I can tell, the mesos master will add up all resources on it's side when a task or tasks are scheduled using multiple offers. So, this code is not currently making an effort to match each task with a portion of the resources available, it is launching all tasks with all offers for the host. It could be a future improvement to save chunks of unused offers so the offer cache could hold on to them for the next run.

@ssalinas ssalinas modified the milestone: 0.16.0 Jun 8, 2017
@ssalinas ssalinas added the hs_qa label Jun 9, 2017
this.offers = offers;
this.roles = MesosUtils.getRoles(offers.get(0));
this.acceptedTasks = Lists.newArrayListWithExpectedSize(taskSizeHint);
this.currentResources = offers.size() > 1 ? MesosUtils.combineResources(offers.stream().map(Protos.Offer::getResourcesList).collect(Collectors.toList())) : offers.get(0).getResourcesList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if it's possible to have two currentResources for this? One for the individual offers (as originally given by mesos) and one for the collection of same host offers? The check could then try all the single offers and check the combined offers if no single offers were good enough.

It could help launch more tasks if the smaller offers were used instead of declined. The tough part would be that the combinedCurrentResources would need to be calculated each time before it's checked since some single offers could be utilized for a task in a later iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more general solution might be to sort out which offers to accept/decline/cache when launching the tasks at the end. i.e. if we have an offers of 2,3 and 4 cpus, and matched tasks that use 3 and 3, we can hold on to the 3 cpu offer, and use the other two. No reason to calculate that as we go along since the total resources is the same, but we can certainly do it at the end with the goal of maximizing usage of the current offers.

#lets-do-the-math

List<Offer> neededOffers = offers.stream().filter(o -> {
List<Resource> remainingAfterSavingOffer = MesosUtils.subtractResources(currentResources, o.getResourcesList());
if (MesosUtils.allResourceCountsNonNegative(remainingAfterSavingOffer)) {
cache.cacheOffer(driver, System.currentTimeMillis(), o); // TODO: do we need this timestamp to be something specific, e.g. the time at which we got the offer originally?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep in mind here, the offer may already be from the cache to start with, we might need to be returning the offer rather than caching it here

@ssalinas ssalinas merged commit 0d09f7e into master Jun 23, 2017
@ssalinas ssalinas deleted the offer_combination branch June 23, 2017 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants