Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Scheduling Improvements #747

Open
bcwaldon opened this issue Aug 5, 2014 · 9 comments
Open

Scheduling Improvements #747

bcwaldon opened this issue Aug 5, 2014 · 9 comments

Comments

@bcwaldon
Copy link
Contributor

bcwaldon commented Aug 5, 2014

UPDATE 9/24:

  • remove note about unfair scheduling as offering/bidding mechanism is gone
  • remove note about supporting memory-based scheduling

There are two major aspects of scheduling for fleet to focus on: resource scheduling and dependency scheduling.

As far as resource scheduling goes, fleet is not going to have a full-featured scheduler. We have no plans to support any resource-related parameters past the leveling of the number of units scheduled to a particular machine.

Dependency scheduling, however, is incredibly important to get right. The following are the currently-supported parameters:

  • MachineID bypasses the scheduler altogether and places a unit directly on a machine
    -MachineMetadata filters the list of possible machines to which a unit can be scheduled using key-value metadata
  • MachineOf provides affinity, scheduling a unit to the same machine as another unit
  • Conflicts provides anti-affinity, scheduling a unit to a different machine than any units that match a glob pattern

There are several ideas for new dependency-scheduling behaviors, which are enumerated below:

@stuart-warren
Copy link

As far as resource scheduling goes, fleet is not going to have a full-featured scheduler.

Does CoreOS intend to support other schedulers?

Past the leveling of the number of units across the cluster, fleet will only take into account memory limits.

So it will put the same number units onto each server? What if I have a few different specs of servers, some massively more powerful than others? Can I set some bias in the fleet config perhaps?

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Aug 6, 2014

@stuart-warren We definitely intend to provide a full solution here, we're just not going to make fleet solve everyone's scheduling problems.

The fleet scheduler supports metadata-based filtering, and the memory scheduling will be relative to the available memory of each machine independently.

@dbason
Copy link

dbason commented Sep 22, 2014

@bcwaldon so what would be considered a full solution? Will we have the ability to weight units (if they require relatively more cpu than other units), or will this be something we need to implement outside of Fleet?

@bcwaldon
Copy link
Contributor Author

At this time, we have no plans to support any resource-related parameters. If this is something you care about, you should explore something like kubernetes or mesos.

@gust1n
Copy link

gust1n commented Oct 3, 2014

I totally get your point about keeping the scheduler simple and instead let others bud more high level tools to solve that. But what about some simple spreading of resources? We're using templates to support simple heroku-like scaling of processes. And we would rather not use the conflict fleet param to spread the jobs since we then have to set limits. But very often if we scale a job to, say 3, they all end up on the same host. And what is worse is that often all jobs of all services end up on the same host. This gives us a scenario where 1 host is under heavy load and the 2 others are not used.

Are there any plans for simple spreading of jobs across a cluster?

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Oct 3, 2014

@gust1n The current scheduler distributes units based on the current number of units scheduled to each host. Are you not not experiencing this?

We've also started a discussion around how fleet can support external schedulers over here: #922. If you have any input, I'd appreciate it greatly.

@gust1n
Copy link

gust1n commented Oct 9, 2014

@bcwaldon Unfortunately (on stable) most of the time almost all jobs (except those with X-Fleet logic) ends up on the same machine. On a machine restart they all migrate to the next one. Don't know if what you're describing is not in the stable channel yet? What you described was what my request was all about, something simple that spreads the jobs. I solved it for now by using some X-Fleet conditions anyways.

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Oct 9, 2014

@gust1n yes, by "current scheduler" I mean fleet v0.7.0+. The stable channel will be updated soon.

@jonboulle
Copy link
Contributor

Cross-post: see #922 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants