Skip to content

Commit

Permalink
Add Running Pods task to docs (#845)
Browse files Browse the repository at this point in the history
* Add Running Pods task to docs

* Responded to PR comments

* Added taints and tolerations

* Small typo

* Responded to a few more review comments

* A few more fixes from comments

* Fixed some broken yaml
  • Loading branch information
chrisnegus committed Nov 27, 2021
1 parent a3a0ff7 commit 6126538
Show file tree
Hide file tree
Showing 2 changed files with 249 additions and 0 deletions.
7 changes: 7 additions & 0 deletions website/content/en/pre-docs/tasks/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: "Tasks"
linkTitle: "Tasks"
weight: 45
---

Karpenter tasks can be divided into those for a cluster operator who is managing the cluster itself and application developers who are deploying pod workloads on a cluster.
242 changes: 242 additions & 0 deletions website/content/en/pre-docs/tasks/running-pods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
---
title: "Running pods"
linkTitle: "Running pods"
weight: 10
---

If your pods have no requirements for how or where to run, you can let Karpenter choose nodes from the full range of available cloud provider resources.
However, by taking advantage of Karpenter's model of layered constraints, you can be sure that the precise type and amount of resources needed are available to your pods.
Reasons for constraining where your pods run could include:

* Needing to run in zones where dependent applications or storage are available
* Requiring certain kinds of processors or other hardware
* Wanting to use techniques like topology spread to help insure high availability

Your Cloud Provider defines the first layer of constraints, including all instance types, architectures, zones, and purchase types available to its cloud.
The cluster operator adds the next layer of constraints by creating one or more provisioners.
The final layer comes from you adding specifications to your Kubernetes pod deployments.
Pod scheduling constraints must fall within a provisioner's constraints or the pods will not deploy.
For example, if the provisioner sets limits that allow only a particular zone to be used, and a pod asks for a different zone, it will not be scheduled.

Constraints you can request include:

* **Resource requests**: Request that certain amount of memory or CPU be available.
* **Node selection**: Choose to run on a node that is has a particular label (`nodeSelector`).
* **Node affinity**: Draws a pod to run on nodes with particular attributes (affinity).
* **Topology spread**: Use topology spread to help insure availability of the application.

Karpenter supports standard Kubernetes scheduling constraints.
This allows you to define a single set of rules that apply to both existing and provisioned capacity.
Pod affinity is a key exception to this rule.

{{% alert title="Note" color="primary" %}}
Karpenter supports specific [Well-Known Labels, Annotations and Taints](Well-Known Labels, Annotations and Taints) that are useful for scheduling.
{{% /alert %}}

## Resource requests (`resources`)

Within a Pod spec, you can both make requests and set limits on resources a pod needs, such as CPU and memory.
For example:

```
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: app
image: myimage
resources:
requests:
memory: "128Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "1000m"
```
In this example, the container is requesting 128MiB of memory and .5 CPU.
Its limits are set to 256MiB of memory and 1 CPU.
Instance type selection math only uses `requests`, but `limits` may be configured to enable resource oversubscription.


See [Managing Resources for Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) for details on resource types supported by Kubernetes, [Specify a memory request and a memory limit](https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#specify-a-memory-request-and-a-memory-limit) for examples of memory requests, and [Specifying Values to Control AWS Provisioning](/docs/cloud-providers/aws/aws-spec-fields) for a list of supported resources.

## Selecting nodes (`nodeSelector` and `nodeAffinity`)

With `nodeSelector` you can ask for a node that matches selected key-value pairs.
This can include well-known labels or custom labels you create yourself.

While `nodeSelector` is like node affinity, it doesn't have the same "and/or" matchExpressions that affinity has.
So all key-value pairs must match if you use `nodeSelector`.
Also, `nodeSelector` can do only do inclusions, while `affinity` can do inclusions and exclusions (`In` and `NotIn`).

### Node selector (`nodeSelector`)

Here is an example of a `nodeSelector` for selecting nodes:

```
nodeSelector:
topology.kubernetes.io/zone: us-west-2a
karpenter.sh/capacity-type: spot
```
This example features a well-known label (`topology.kubernetes.io/zone`) and a label that is well known to Karpenter (`karpenter.sh/capacity-type`).

If you want to create a custom label, you should do that at the provisioner level.
Then the pod can declare that custom label.


See [nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) in the Kubernetes documentation for details.

### Node affinity (`nodeAffinity`)

Examples below illustrate how to use Node affinity to include (`In`) and exclude (`NotIn`) objects.
See [Node affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity) for details.
When setting rules, the following Node affinity types define how hard or soft each rule is:

* **requiredDuringSchedulingIgnoredDuringExecution**: This is a hard rule that must be met.
* **preferredDuringSchedulingIgnoredDuringExecution**: This is a preference, but the pod can run on a node where it is not guaranteed.

The `IgnoredDuringExecution` part of each tells the pod to keep running, even if conditions change on the node so the rules no longer matched.
You can think of these concepts as `required` and `preferred`, since Kubernetes never implemented other variants of these rules.

All examples below assume that the provisioner doesn't have constraints to prevent those zones from being used.
The first constraint says you could use `us-west-2a` or `us-west-2b`, the second constraint makes it so only `us-west-2b` can be used.

```
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "topology.kubernetes.io/zone"
operator: "In"
values: ["us-west-2a, us-west-2b"]
- key: "topology.kubernetes.io/zone"
operator: "In"
values: ["us-west-2b"]
```

Changing the second operator to `NotIn` would allow the pod to run in `us-west-2a` only:

```
- key: "topology.kubernetes.io/zone"
operator: "In"
values: ["us-west-2a, us-west-2b"]
- key: "topology.kubernetes.io/zone"
operator: "NotIn"
values: ["us-west-2b"]
```

Continuing to add to the example, `nodeAffinity` lets you define terms so if one term doesn't work it goes to the next one.
Here, if `us-west-2a` is not available, the second term will cause the pod to run on a spot instance in us-west-2d.


```
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: # OR
- key: "topology.kubernetes.io/zone" # AND
operator: "In"
values: ["us-west-2a, us-west-2b"]
- key: "topology.kubernetes.io/zone" # AND
operator: "NotIn"
values: ["us-west-2b"]
- matchExpressions: # OR
- key: "karpenter.sh/capacity-type" # AND
operator: "In"
values: ["spot"]
- key: "topology.kubernetes.io/zone" # AND
operator: "In"
values: ["us-west-2d"]
```
In general, Karpenter will go through each of the `nodeSelectorTerms` in order and take the first one that works.
However, if Karpenter fails to provision on the first `nodeSelectorTerms`, it will try again using the second one.
If they all fail, Karpenter will fail to provision the pod.
Karpenter will backoff and retry over time.
So if capacity becomes available, it will schedule the pod without user intervention.

## Taints and tolerations

Taints are the opposite of affinity.
Setting a taint on a node tells the scheduler to not run a pod on it unless the pod has explicitly said it can tolerate that taint.
This example shows a Provisioner that was set up with a taint for only running pods that require a GPU, such as the following:


```
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: gpu
spec:
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values:
- p3.2xlarge
- p3.8xlarge
- p3.16xlarge
taints:
- key: nvidia.com/gpu
value: true
effect: “NoSchedule”
```

For a pod to request to run on a node that has provisioner, it could set a toleration as follows:

```
apiVersion: v1
kind: Pod
metadata:
name: mygpupod
spec:
containers:
- name: gpuapp
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
image: mygpucontainer
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
```
See Taints and Tolerations (https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) in the Kubernetes documentation for details.

## Topology spread (`topologySpreadConstraints`)

By using the Kubernetes `topologySpreadConstraints` you can ask the provisioner to have pods push away from each other to limit the blast radius of an outage.
Think of it as the Kubernetes evolution for pod affinity: it lets you relate pods with respect to nodes while still allowing spread.
For example:

```
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
dev: jjones
- maxSkew: 1
topologyKey: "kubernetes.io/hostname"
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
dev: jjones
```
Adding this to your podspec would result in:

* Pods being spread across both zones and hosts (`topologyKey`).
* The `dev` `labelSelector` will include all pods with the label of `dev=jjones` in topology calculations. It is recommended to use a selector to match all pods in a deployment.
* No more than one pod difference in the number of pods on each host (`maxSkew`).
For example, if there were three nodes and five pods the pods could be spread 1, 2, 2 or 2, 1, 2 and so on.
If instead the spread were 5, pods could be 5, 0, 0 or 3, 2, 0, or 2, 1, 2 and so on.
* Karpenter is always able to improve skew by launching new nodes in the right zones. Therefore, `whenUnsatisfiable` does not change provisioning behavior.

See [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/) for details.

0 comments on commit 6126538

Please sign in to comment.