Skip to content

Commit

Permalink
docs: updated deprovsioning docs (#2943)
Browse files Browse the repository at this point in the history
  • Loading branch information
njtran committed Nov 30, 2022
1 parent 874c5f8 commit bb1a437
Show file tree
Hide file tree
Showing 32 changed files with 149 additions and 219 deletions.
16 changes: 2 additions & 14 deletions website/content/en/preview/provisioner.md
Expand Up @@ -116,19 +116,7 @@ spec:

## Node deprovisioning

If neither of these values are set, Karpenter will *not* delete instances. It is recommended to set the `ttlSecondsAfterEmpty` value, to enable scale down of the cluster.

### spec.ttlSecondsAfterEmpty

Setting a value here enables Karpenter to delete empty/unnecessary instances. DaemonSets are excluded from considering a node "empty". This value is in seconds.

### spec.ttlSecondsUntilExpired

Setting a value here enables node expiry. After nodes reach the defined age in seconds, they will be deleted, even if in use. This enables nodes to effectively be periodically "upgraded" by replacing them with newly provisioned instances.

Note that Karpenter does not automatically add jitter to this value. If multiple instances are created in a small amount of time, they will expire at very similar times. Consider defining a [pod disruption budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) to prevent excessive workload disruption.


You can configure Karpenter to deprovision instances through your Provisioner in multiple ways. You can use `spec.TTLSecondsAfterEmpty`, `spec.ttlSecondsUntilExpired` or `spec.consolidation.enabled`. Read [Deprovisioning](../tasks/deprovisioning/) for more.

## spec.requirements

Expand Down Expand Up @@ -271,7 +259,7 @@ For more information on the default `--system-reserved` and `--kube-reserved` co

### Eviction Thresholds

The kubelet supports eviction thresholds by default. When enough memory or file system pressure is exerted on the node, the kubelet will begin to evict pods to ensure that system daemons and other system processes can continue to run in a healthy manner.
The kubelet supports eviction thresholds by default. When enough memory or file system pressure is exerted on the node, the kubelet will begin to evict pods to ensure that system daemons and other system processes can continue to run in a healthy manner.

Kubelet has the notion of [hard evictions](https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds) and [soft evictions](https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#soft-eviction-thresholds). In hard evictions, pods are evicted as soon as a threshold is met, with no grace period to terminate. Soft evictions, on the other hand, provide an opportunity for pods to be terminated gracefully. They do so by sending a termination signal to pods that are planning to be evicted and allowing those pods to terminate up to their grace period.

Expand Down
8 changes: 5 additions & 3 deletions website/content/en/preview/tasks/deprovisioning.md
Expand Up @@ -35,6 +35,8 @@ example, if a cluster brings up all nodes at once, all the pods on those nodes w
the same batching window on expiration.

- Pods without an ownerRef (also called "controllerless" or "naked" pods) will be evicted during voluntary node disruption, such as expiration or consolidation. A pod with the annotation `karpenter.sh/do-not-evict: true` will cause its node to be opted out from voluntary node disruption workflows.

- Using preferred anti-affinity and topology spreads can reduce the effectiveness of consolidation. At node launch, Karpenter attempts to satisfy affinity and topology spread preferences. In order to reduce node churn, consolidation must also attempt to satisfy these constraints to avoid immediately consolidating nodes after they launch. This means that consolidation may not deprovision nodes in order to avoid violating preferences, even if kube-scheduler can fit the host pods elsewhere.
{{% /alert %}}

* **Node deleted**: You could use `kubectl` to manually remove a single Karpenter node:
Expand Down Expand Up @@ -64,8 +66,8 @@ All the pod objects get deleted by a garbage collection process later, because t
## Consolidation

Karpenter has two mechanisms for cluster consolidation:
- Deletion - A node is eligible for deletion if all of its pods can run on free capacity of other nodes in the cluster.
- Replace - A node can be replaced if all of its pods can run on a combination of free capacity of other nodes in the cluster and a single cheaper replacement node.
- Deletion - A node is eligible for deletion if all of its pods can run on free capacity of other nodes in the cluster.
- Replace - A node can be replaced if all of its pods can run on a combination of free capacity of other nodes in the cluster and a single cheaper replacement node.

Consolidation has three mechanisms that are performed in order to attempt to identify a consolidation action:
1) Empty Node Consolidation - Delete any entirely empty nodes in parallel
Expand All @@ -81,7 +83,7 @@ When there are multiple nodes that could be potentially deleted or replaced, Kar
* nodes with lower priority pods

{{% alert title="Note" color="primary" %}}
For spot nodes, Karpenter only uses the deletion consolidation mechanism. It will not replace a spot node with a cheaper spot node. Spot instance types are selected with the `price-capacity-optimized` strategy and often the cheapest spot instance type is not launched due to the likelihood of interruption. Consolidation would then replace the spot instance with a cheaper instance negating the `price-capacity-optimized` strategy entirely and increasing interruption rate.
For spot nodes, Karpenter only uses the deletion consolidation mechanism. It will not replace a spot node with a cheaper spot node. Spot instance types are selected with the `price-capacity-optimized` strategy and often the cheapest spot instance type is not launched due to the likelihood of interruption. Consolidation would then replace the spot instance with a cheaper instance negating the `price-capacity-optimized` strategy entirely and increasing interruption rate.
{{% /alert %}}

## Interruption
Expand Down
20 changes: 4 additions & 16 deletions website/content/en/v0.16.0/provisioner.md
Expand Up @@ -19,14 +19,14 @@ spec:
# that can't be removed. Mutually exclusive with the ttlSecondsAfterEmpty parameter.
consolidation:
enabled: true

# If omitted, the feature is disabled and nodes will never expire. If set to less time than it requires for a node
# to become ready, the node may expire before any pods successfully start.
ttlSecondsUntilExpired: 2592000 # 30 Days = 60 * 60 * 24 * 30 Seconds;

# If omitted, the feature is disabled, nodes will never scale down due to low utilization
ttlSecondsAfterEmpty: 30

# Priority given to the provisioner when the scheduler considers which provisioner
# to select. Higher weights indicate higher priority when comparing provisioners.
# Specifying no weight is equivalent to specifying a weight of 0.
Expand Down Expand Up @@ -91,19 +91,7 @@ spec:

## Node deprovisioning

If neither of these values are set, Karpenter will *not* delete instances. It is recommended to set the `ttlSecondsAfterEmpty` value, to enable scale down of the cluster.

### spec.ttlSecondsAfterEmpty

Setting a value here enables Karpenter to delete empty/unnecessary instances. DaemonSets are excluded from considering a node "empty". This value is in seconds.

### spec.ttlSecondsUntilExpired

Setting a value here enables node expiry. After nodes reach the defined age in seconds, they will be deleted, even if in use. This enables nodes to effectively be periodically "upgraded" by replacing them with newly provisioned instances.

Note that Karpenter does not automatically add jitter to this value. If multiple instances are created in a small amount of time, they will expire at very similar times. Consider defining a [pod disruption budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) to prevent excessive workload disruption.


You can configure Karpenter to deprovision instances through your Provisioner in multiple ways. You can use `spec.TTLSecondsAfterEmpty`, `spec.ttlSecondsUntilExpired` or `spec.consolidation.enabled`. Read [Deprovisioning](./tasks/deprovisioning.md) for more.

## spec.requirements

Expand Down Expand Up @@ -191,7 +179,7 @@ Karpenter also allows `karpenter.sh/capacity-type` to be used as a topology key

## spec.weight

Karpenter allows you to describe provisioner preferences through a `weight` mechanism similar to how weight is described with [pod and node affinities](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity).
Karpenter allows you to describe provisioner preferences through a `weight` mechanism similar to how weight is described with [pod and node affinities](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity).

For more information on weighting provisioners, see the [Weighting Provisioners section](../tasks/scheduling#weighting-provisioners) in the scheduling details.

Expand Down
12 changes: 8 additions & 4 deletions website/content/en/v0.16.0/tasks/deprovisioning.md
Expand Up @@ -32,6 +32,10 @@ default values for them and will not terminate nodes for that purpose.
- Keep in mind that a small NodeExpiry results in a higher churn in cluster activity. So, for
example, if a cluster brings up all nodes at once, all the pods on those nodes would fall into
the same batching window on expiration.

- Note that Karpenter does not automatically add jitter to this value. If multiple instances are created in a small amount of time, they will expire at very similar times. Consider defining a [pod disruption budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) to prevent excessive workload disruption.

- Using preferred anti-affinity and topology spreads can reduce the effectiveness of consolidation. At node launch, Karpenter attempts to satisfy affinity and topology spread preferences. In order to reduce node churn, consolidation must also attempt to satisfy these constraints to avoid immediately consolidating nodes after they launch. This means that consolidation may not deprovision nodes in order to avoid violating preferences, even if kube-scheduler can fit the host pods elsewhere.
{{% /alert %}}

* **Node deleted**: You could use `kubectl` to manually remove a single Karpenter node:
Expand Down Expand Up @@ -62,8 +66,8 @@ All the pod objects get deleted by a garbage collection process later, because t


Karpenter has two mechanisms for cluster consolidation:
- Deletion - A node is eligible for deletion if all of its pods can run on free capacity of other nodes in the cluster.
- Replace - A node can be replaced if all of its pods can run on a combination of free capacity of other nodes in the cluster and a single cheaper replacement node.
- Deletion - A node is eligible for deletion if all of its pods can run on free capacity of other nodes in the cluster.
- Replace - A node can be replaced if all of its pods can run on a combination of free capacity of other nodes in the cluster and a single cheaper replacement node.

When there are multiple nodes that could be potentially deleted or replaced, Karpenter choose to consolidate the node that overall disrupts your workloads the least by preferring to terminate:

Expand All @@ -72,7 +76,7 @@ When there are multiple nodes that could be potentially deleted or replaced, Kar
* nodes with lower priority pods

{{% alert title="Note" color="primary" %}}
For spot nodes, Karpenter only uses the deletion mechanism for consolidation. It will not replace a spot node with a cheaper spot node. Spot instance types are selected with the `capacity-optimized-prioritized` strategy and often the cheapest spot instance type is not launched due to the likelihood of interruption. Consolidation would then replace the spot instance with a cheaper instance negating the `capacity-optimized-prioritized` strategy entirely and increasing interruption rate.
For spot nodes, Karpenter only uses the deletion mechanism for consolidation. It will not replace a spot node with a cheaper spot node. Spot instance types are selected with the `capacity-optimized-prioritized` strategy and often the cheapest spot instance type is not launched due to the likelihood of interruption. Consolidation would then replace the spot instance with a cheaper instance negating the `capacity-optimized-prioritized` strategy entirely and increasing interruption rate.
{{% /alert %}}

## What can cause deprovisioning to fail?
Expand Down Expand Up @@ -104,7 +108,7 @@ Review what [disruptions are](https://kubernetes.io/docs/concepts/workloads/pods

### Pod set to do-not-evict

If a pod exists with the annotation `karpenter.sh/do-not-evict: true` on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node. Nodes that have pods with a `do-not-evict` annotation are not considered for consolidation, though their unused capacity is considered for the purposes of running pods from other nodes which can ber consolidated. This annotation will have no effect for static pods, pods that tolerate `NoSchedule`, or pods terminating past their graceful termination period.
If a pod exists with the annotation `karpenter.sh/do-not-evict: true` on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node. Nodes that have pods with a `do-not-evict` annotation are not considered for consolidation, though their unused capacity is considered for the purposes of running pods from other nodes which can ber consolidated. This annotation will have no effect for static pods, pods that tolerate `NoSchedule`, or pods terminating past their graceful termination period.

This is useful for pods that you want to run from start to finish without interruption.
Examples might include a real-time, interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted.
Expand Down
6 changes: 4 additions & 2 deletions website/content/en/v0.16.0/tasks/provisioning.md
Expand Up @@ -101,7 +101,8 @@ kind: Provisioner
metadata:
name: gpu
spec:
ttlSecondsAfterEmpty: 60
consolidation:
enabled: true
requirements:
- key: node.kubernetes.io/instance-type
operator: In
Expand All @@ -123,7 +124,8 @@ kind: Provisioner
metadata:
name: cilium-startup
spec:
ttlSecondsAfterEmpty: 60
consolidation:
enabled: true
startupTaints:
- key: node.cilium.io/agent-not-ready
value: "true"
Expand Down
14 changes: 1 addition & 13 deletions website/content/en/v0.16.1/provisioner.md
Expand Up @@ -91,19 +91,7 @@ spec:

## Node deprovisioning

If neither of these values are set, Karpenter will *not* delete instances. It is recommended to set the `ttlSecondsAfterEmpty` value, to enable scale down of the cluster.

### spec.ttlSecondsAfterEmpty

Setting a value here enables Karpenter to delete empty/unnecessary instances. DaemonSets are excluded from considering a node "empty". This value is in seconds.

### spec.ttlSecondsUntilExpired

Setting a value here enables node expiry. After nodes reach the defined age in seconds, they will be deleted, even if in use. This enables nodes to effectively be periodically "upgraded" by replacing them with newly provisioned instances.

Note that Karpenter does not automatically add jitter to this value. If multiple instances are created in a small amount of time, they will expire at very similar times. Consider defining a [pod disruption budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) to prevent excessive workload disruption.


You can configure Karpenter to deprovision instances through your Provisioner in multiple ways. You can use `spec.TTLSecondsAfterEmpty`, `spec.ttlSecondsUntilExpired` or `spec.consolidation.enabled`. Read [Deprovisioning](../tasks/deprovisioning/) for more.

## spec.requirements

Expand Down

0 comments on commit bb1a437

Please sign in to comment.