-
Notifications
You must be signed in to change notification settings - Fork 302
RFC: Requirements for an extensible scheduling system #1055
Comments
Hey guys, great to see that draft of requirements. I recently thought about different scheduling strategies and how to enable fleet to do so. Since we are running various clusters of coreos we are really interested to be able to schedule units based on different policies. Reading the requirements above leads to the following brain dump how such a plugin could be designed and work. 1. scheduler strategiesFor different needs there should be different strategies to schedule units. This strategies could easily be defined in the Here are possible strategies crossing my mind:
This is how it could look like in a unit file:
2. configuring
|
@zyndiecate I don't think there is much benefit in doing metrics based on cores - when taken in isolation it's not particularly useful. If your cpushares are set correctly then it's the number of cores divided by the number of units on the machine that influences cpu time. I think being able to chain a series of simple composable schedulers (as described in #1049) is really going to drive the most value - think a moog synthesizer for scheduling ;). This keeps the core scheduler stable, and allows for community extensions |
Thank you @jonboulle for the update. I would like one clarification on the requirements. Under Behavior Preserving you call out "No changes to the fleet API". I believe this should be clarified to: "No changes to the user-facing & user APIs", i.e. the API used to schedule, retrieve, and view units should stay as it is today to limit the impact on existing users. I think it makes sense to expand the API to allow better interaction between fleet and the external schedulers. This would be a separate API (separate port like etcd client vs server ports?) used only by the schedulers to register themselves and publish/retrieve scheduler data. I like the idea of calling out which schedulers to run and in which order via a new parameter in X-Fleet, similar to @zyndiecate's example above with multiple values. This removes the need to specify any sort of priority when registering as well as allow different types of units to request specific types of scheduling. I'd like to propose the following additions to the requirements:
I would also like to (re-) propose that the default batteries included scheduler receive a few upgrades. A scheme similar to #945 would allow for more out of the box uses without modifying the default behavior as it stands today. |
This is accurate, but it's your follow on point (a new and expanded scheduler-specific API) that concerns me a little. To quote myself from the end of the OP:
If I understand @zyndiecate's proposal correctly, it wouldn't involve such an increase in complexity as the interface would be limited to a single call. |
Could you explain a little more on why you feel #945 should be baked into the core rather than implemented as a chainable scheduler? |
I don't see the API becoming arbitrarily complex, however I still maintain that fleet itself should provide some rudimentary persistence. Persistence would consist of single-level key value pairs per machine, collected and sent to fleet from each machine. Having this persistence would simplify the setup and use of the schedulers and increase performance (scheduler could request data at registration time and appropriate data could be sent along on a scheduler request). I think this is worth the slight complexity that it would add to fleet itself. The following two endpoints would be added to Fleet
Each scheduler would implement just one endpoint
The fleet unit itself specifies the order of the schedulers to run. The list of machines output from a scheduler is fed into the next in the chain, even if there is only 1 machine to ensure the machine meets all qualifications. If at any point a scheduler returns zero machines, the unit cannot be scheduled. |
@jonboulle As for #945 I think that the default scheduler as it exists today just isn't enough. While I agree that the reservation system could (and probably should) be moved to a separate chained scheduler, the job multiplier feels like an easy, low complexity, addition with a lot of benefit. Allowing fleet to load balance asymmetric work loads would open it up for a lot more applications out of the box. |
While I think that this RFC is pretty cool, it does not really look like its getting traction. Is there any way that we could have something simpler to deal with just over provisioning of memory? Right now I have a problem where we have 10 machines running cores os, and about 12 services running multiple versions. 4 services can fit on 1 machine, they are all about the same size. This means that we have the capacity to run 40 service copies, much more than the 24 or so that we need. Even with this if a machine or two fails or restarts, then a cascading failure occurs as it piles too many services on one machine. |
I was hoping to work on this next, but am currently stalled out on #1077. I am hoping now that etcd2.0 has shipped in alpha that @jonboulle and @bcwaldon can give me a bit of direction on both issues. |
I see that there was an attempt to bring machine resources into play, but it was deleted in ab275c1 (although |
@jonboulle , seems you forget to list the #943 requirement. |
/cc @htr |
fleet’s current scheduling engine is rudimentary, and there have been various proposals (1 2 3 4) to either enhance the complexity of the scheduling within fleet, or provide a means for users to extend it without needing to run a custom branch of fleet.
This issue aims to capture the design requirements and restrictions for a solution to these requests, such that it can be implemented in a way a) keeping with fleet’s architecture and design goals, and b) without impacting existing users of fleet.
(Bear in mind that this is a work in progress proposal, not a final set of hard requirements; please provide feedback below).
The solution should be:
fleetd
) for fleet itself)X-Fleet
options should be sufficientSince comparisons will inevitably arise, we will take this opportunity to draw a distinction between what we’re aiming for and the Mesos framework model. We do not anticipate the API between fleet and the pluggable schedulers becoming arbitrarily complex (ideally it should be limited to the single request-response described above), and we would still consider fleet to be the “entrypoint” for users deploying applications (c.f. Mesos, where the entrypoint is typically the scheduler). To put it another way, schedulers should plug in behind fleet rather than on top of fleet.
The text was updated successfully, but these errors were encountered: