-
Notifications
You must be signed in to change notification settings - Fork 66
feat: include tainted nodes in utilisation denominator #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -237,6 +237,13 @@ func (c *Controller) scaleNodeGroup(nodegroup string, nodeGroup *NodeGroupState) | |
| // Filter into untainted and tainted nodes | ||
| untaintedNodes, taintedNodes, forceTaintedNodes, cordonedNodes := c.filterNodes(nodeGroup, allNodes) | ||
|
|
||
| // Determine which nodes count toward capacity for utilisation calculation | ||
| capacityNodes := untaintedNodes | ||
| if nodeGroup.Opts.IncludeTaintedInCapacity { | ||
| capacityNodes = append(append([]*v1.Node{}, untaintedNodes...), taintedNodes...) | ||
| log.WithField("nodegroup", nodegroup).Infof("Including %v tainted nodes in capacity calculation (total capacity nodes: %v)", len(taintedNodes), len(capacityNodes)) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be at info level? This log will run at every-tick and could be very noisy. Can this be changed to debug level, or only logging when len(taintedNodes) > 0. |
||
| } | ||
|
|
||
| // Metrics and Logs | ||
| log.WithField("nodegroup", nodegroup).Infof("pods total: %v", len(pods)) | ||
| log.WithField("nodegroup", nodegroup).Infof("nodes remaining total: %v", len(allNodes)) | ||
|
|
@@ -274,14 +281,14 @@ func (c *Controller) scaleNodeGroup(nodegroup string, nodeGroup *NodeGroupState) | |
| // for working out which pods are on which nodes | ||
| nodeGroup.NodeInfoMap = k8s.CreateNodeNameToInfoMap(pods, allNodes) | ||
|
|
||
| // Calc capacity for untainted nodes | ||
| // Calc capacity for nodes included in utilisation calculation | ||
| podRequests, err := k8s.CalculatePodsRequestedUsage(pods) | ||
| if err != nil { | ||
| log.Errorf("Failed to calculate requests: %v", err) | ||
| return 0, err | ||
| } | ||
|
|
||
| nodeCapacity, err := k8s.CalculateNodesCapacity(untaintedNodes, pods) | ||
| nodeCapacity, err := k8s.CalculateNodesCapacity(capacityNodes, pods) | ||
| if err != nil { | ||
| log.Errorf("Failed to calculate capacity: %v", err) | ||
| return 0, err | ||
|
|
@@ -319,14 +326,14 @@ func (c *Controller) scaleNodeGroup(nodegroup string, nodeGroup *NodeGroupState) | |
| } | ||
|
|
||
| // Calc % | ||
| // both cpu and memory capacity are based on number of untainted nodes | ||
| // pass number of untainted nodes in to help make decision if it's a scaling-up-from-0 | ||
| // both cpu and memory capacity are based on number of capacity nodes | ||
| // pass number of capacity nodes in to help make decision if it's a scaling-up-from-0 | ||
| cpuPercent, memPercent, err := calcPercentUsage( | ||
| *podRequests.Total.GetCPUQuantity(), | ||
| *podRequests.Total.GetMemoryQuantity(), | ||
| *nodeCapacity.Total.GetCPUQuantity(), | ||
| *nodeCapacity.Total.GetMemoryQuantity(), | ||
| int64(len(untaintedNodes))) | ||
| int64(len(capacityNodes))) | ||
| if err != nil { | ||
| log.Errorf("Failed to calculate percentages: %v", err) | ||
| return 0, err | ||
|
|
@@ -374,7 +381,7 @@ func (c *Controller) scaleNodeGroup(nodegroup string, nodeGroup *NodeGroupState) | |
| // we want to add enough nodes such that the maxPercentage cluster util | ||
| // drops back below ScaleUpThresholdPercent | ||
| nodesDelta, err = calcScaleUpDelta( | ||
| untaintedNodes, | ||
| capacityNodes, | ||
| cpuPercent, | ||
| memPercent, | ||
| *podRequests.Total.GetCPUQuantity(), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -63,6 +63,10 @@ type NodeGroupOptions struct { | |
| // allowed in the nodegroup at any given time. | ||
| MaxUnhealthyNodesPercent int `json:"max_unhealthy_nodes_percent,omitempty" yaml:"max_unhealthy_nodes_percent,omitempty"` | ||
|
|
||
| // IncludeTaintedInCapacity includes tainted nodes in the capacity denominator | ||
| // for utilisation calculations, preventing artificial spikes when nodes are tainted. | ||
| IncludeTaintedInCapacity bool `json:"include_tainted_in_capacity,omitempty" yaml:"include_tainted_in_capacity,omitempty"` | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is adding a new user-facing config, please update in https://github.com/atlassian/escalator/blob/master/docs/configuration/nodegroup.md or https://github.com/atlassian/escalator/blob/master/docs/configuration/advanced-configuration.md |
||
|
|
||
| // Private variables for storing the parsed duration from the string | ||
| softDeleteGracePeriodDuration time.Duration | ||
| hardDeleteGracePeriodDuration time.Duration | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking through the premise of the fix, does it make more sense to instead change the calculation on the numerator side?
That is, rather than including tainted nodes in the available capacity (which really, they aren't, as the scheduler can't place on them) we could exclude pods on tainted nodes from the load (which makes some sense, they are draining pods)
The crux of the utilisation calc is to determine what % pressure is on the pool of resources that are available to the scheduler
The concern could be that computing that exclusion could be expensive - we essentially need to do another filter pass on pods and inspect if the assigned node is part of our tainted list
@dtnyn thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either way will address the specific problem about inaccurate numerator, but exclusion does seem more semantically accurate since like you mentioned they aren't actually "available capacity" so the same treatment to their workload make sense.
I don't think the extra check would be changing the time scaling cost. Since a cursory glance should be able to achieve it via checking membership in a set of tainted node which should be a subset of the loop we're already doing with mapPodsToNode() which goes through all current pods