-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler extension #13580
Scheduler extension #13580
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING --> | ||
|
||
<!-- BEGIN STRIP_FOR_RELEASE --> | ||
|
||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING" | ||
width="25" height="25"> | ||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING" | ||
width="25" height="25"> | ||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING" | ||
width="25" height="25"> | ||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING" | ||
width="25" height="25"> | ||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING" | ||
width="25" height="25"> | ||
|
||
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2> | ||
|
||
If you are using a released version of Kubernetes, you should | ||
refer to the docs that go with that version. | ||
|
||
<strong> | ||
The latest release of this document can be found | ||
[here](http://releases.k8s.io/release-1.1/docs/design/scheduler_extender.md). | ||
|
||
Documentation for other releases can be found at | ||
[releases.k8s.io](http://releases.k8s.io). | ||
</strong> | ||
-- | ||
|
||
<!-- END STRIP_FOR_RELEASE --> | ||
|
||
<!-- END MUNGE: UNVERSIONED_WARNING --> | ||
|
||
# Scheduler extender | ||
|
||
There are three ways to add new scheduling rules (predicates and priority functions) to Kubernetes: (1) by adding these rules to the scheduler and recompiling (described here: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/scheduler.md), (2) implementing your own scheduler process that runs instead of, or alongside of, the standard Kubernetes scheduler, (3) implementing a "scheduler extender" process that the standard Kubernetes scheduler calls out to as a final pass when making scheduling decisions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAICT, we can improve the docs by separating it into two sub-sections
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it common to lay out goals and non-goals in k8s docs? Another nice thing to talk about is that lay out goals such as:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the feedback, will address this separately in another PR. |
||
|
||
This document describes the third approach. This approach is needed for use cases where scheduling decisions need to be made on resources not directly managed by the standard Kubernetes scheduler. The extender helps make scheduling decisions based on such resources. (Note that the three approaches are not mutually exclusive.) | ||
|
||
When scheduling a pod, the extender allows an external process to filter and prioritize nodes. Two separate http/https calls are issued to the extender, one for "filter" and one for "prioritize" actions. To use the extender, you must create a scheduler policy configuration file. The configuration specifies how to reach the extender, whether to use http or https and the timeout. | ||
|
||
```go | ||
// Holds the parameters used to communicate with the extender. If a verb is unspecified/empty, | ||
// it is assumed that the extender chose not to provide that extension. | ||
type ExtenderConfig struct { | ||
// URLPrefix at which the extender is available | ||
URLPrefix string `json:"urlPrefix"` | ||
// Verb for the filter call, empty if not supported. This verb is appended to the URLPrefix when issuing the filter call to extender. | ||
FilterVerb string `json:"filterVerb,omitempty"` | ||
// Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender. | ||
PrioritizeVerb string `json:"prioritizeVerb,omitempty"` | ||
// The numeric multiplier for the node scores that the prioritize call generates. | ||
// The weight should be a positive integer | ||
Weight int `json:"weight,omitempty"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There seems to be only one value of weight. But shouldn't one extender have multiple prioritize function? Additionally, why note let the remote call also returns the weight too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each extender has one priority function and one predicate function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What prevents each extender to have multiple filter and priority functions? It's one HTTP roundtrip :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So a way to model this is that one extender is an isolated identity that provides some extension functions to do the work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it would just make things more complicated. I think it's simpler if there's one prioritize endpoint that runs one prioritize function, and one filter endpoint that runs one predicate function. Since they are presumably in the same process, it should not be hard to combine the logic into one predicate function and one priority function, so I don't think this overly restricts the generality of the extender. |
||
// EnableHttps specifies whether https should be used to communicate with the extender | ||
EnableHttps bool `json:"enableHttps,omitempty"` | ||
// TLSConfig specifies the transport layer security config | ||
TLSConfig *client.TLSClientConfig `json:"tlsConfig,omitempty"` | ||
// HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize | ||
// timeout is ignored, k8s/other extenders priorities are used to select the node. | ||
HTTPTimeout time.Duration `json:"httpTimeout,omitempty"` | ||
} | ||
``` | ||
|
||
A sample scheduler policy file with extender configuration: | ||
|
||
```json | ||
{ | ||
"predicates": [ | ||
{ | ||
"name": "HostName" | ||
}, | ||
{ | ||
"name": "MatchNodeSelector" | ||
}, | ||
{ | ||
"name": "PodFitsResources" | ||
} | ||
], | ||
"priorities": [ | ||
{ | ||
"name": "LeastRequestedPriority", | ||
"weight": 1 | ||
} | ||
], | ||
"extenders": [ | ||
{ | ||
"urlPrefix": "http://127.0.0.1:12345/api/scheduler", | ||
"filterVerb": "filter", | ||
"enableHttps": false | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Arguments passed to the FilterVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and the pod. Arguments passed to the PrioritizeVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and extender predicates and the pod. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can improve the docs by adding two more sections here -- one for predicate and one for prioritization. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will take up in another PR |
||
|
||
```go | ||
// ExtenderArgs represents the arguments needed by the extender to filter/prioritize | ||
// nodes for a pod. | ||
type ExtenderArgs struct { | ||
// Pod being scheduled | ||
Pod api.Pod `json:"pod"` | ||
// List of candidate nodes where the pod can be scheduled | ||
Nodes api.NodeList `json:"nodes"` | ||
} | ||
``` | ||
|
||
The "filter" call returns a list of nodes (api.NodeList). The "prioritize" call returns priorities for each node (schedulerapi.HostPriorityList). | ||
|
||
The "filter" call may prune the set of nodes based on its predicates. Scores returned by the "prioritize" call are added to the k8s scores (computed through its priority functions) and used for final host selection. | ||
|
||
Multiple extenders can be configured in the scheduler policy. | ||
|
||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> | ||
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/scheduler_extender.md?pixel)]() | ||
<!-- END MUNGE: GENERATED_ANALYTICS --> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
{ | ||
"kind" : "Policy", | ||
"apiVersion" : "v1", | ||
"predicates" : [ | ||
{"name" : "PodFitsPorts"}, | ||
{"name" : "PodFitsResources"}, | ||
{"name" : "NoDiskConflict"}, | ||
{"name" : "MatchNodeSelector"}, | ||
{"name" : "HostName"} | ||
], | ||
"priorities" : [ | ||
{"name" : "LeastRequestedPriority", "weight" : 1}, | ||
{"name" : "BalancedResourceAllocation", "weight" : 1}, | ||
{"name" : "ServiceSpreadingPriority", "weight" : 1}, | ||
{"name" : "EqualPriority", "weight" : 1} | ||
], | ||
"extender": { | ||
"url": "http://127.0.0.1:12346/scheduler", | ||
"apiVersion": "v1beta1", | ||
"filterVerb": "filter", | ||
"prioritizeVerb": "prioritize", | ||
"weight": 5, | ||
"enableHttps": false | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to call this an 'extension' ? We use nouns for things like this elsewhere in the system (viz: 'plugin').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm sure it wasn't the original intention, one reason to call it an "extender" might be so that we can reserve the term "extension" for the unified extension architecture we will hopefully develop eventually (to cover all call-out extensions like this one, admission controller extension, etc.).