From 663b366836b212b7ca86370f7e169aafe04bf048 Mon Sep 17 00:00:00 2001 From: Jonathan Innis Date: Mon, 16 Oct 2023 16:18:44 -0700 Subject: [PATCH] docs: Update the FAQ doc to v1beta1 apis and concepts (#4835) --- .../content/en/preview/concepts/nodepools.md | 5 +- .../content/en/preview/concepts/scheduling.md | 3 +- website/content/en/preview/faq.md | 144 +++++++----------- 3 files changed, 62 insertions(+), 90 deletions(-) diff --git a/website/content/en/preview/concepts/nodepools.md b/website/content/en/preview/concepts/nodepools.md index 68c2eb5726cf..612cf5030c66 100644 --- a/website/content/en/preview/concepts/nodepools.md +++ b/website/content/en/preview/concepts/nodepools.md @@ -144,9 +144,8 @@ spec: # Resource limits constrain the total size of the cluster. # Limits prevent Karpenter from creating new instances once the limit is exceeded. limits: - resources: - cpu: "1000" - memory: 1000Gi + cpu: "1000" + memory: 1000Gi # Priority given to the NodePool when the scheduler considers which NodePool # to select. Higher weights indicate higher priority when comparing NodePools. diff --git a/website/content/en/preview/concepts/scheduling.md b/website/content/en/preview/concepts/scheduling.md index 9e7e2a4402b4..dc7af6a933e1 100755 --- a/website/content/en/preview/concepts/scheduling.md +++ b/website/content/en/preview/concepts/scheduling.md @@ -453,8 +453,7 @@ metadata: spec: weight: 50 limits: - resources: - cpu: 100 + cpu: 100 template: spec: requirements: diff --git a/website/content/en/preview/faq.md b/website/content/en/preview/faq.md index d29e497896aa..09d7d051ee8a 100644 --- a/website/content/en/preview/faq.md +++ b/website/content/en/preview/faq.md @@ -7,28 +7,26 @@ description: > --- ## General -### How does a provisioner decide to manage a particular node? -See [Configuring provisioners]({{< ref "./concepts/#configuring-provisioners" >}}) for information on how Karpenter provisions and manages nodes. +### How does a NodePool decide to manage a particular node? +See [Configuring NodePools]({{< ref "./concepts/#configuring-nodepools" >}}) for information on how Karpenter configures and manages nodes. ### What cloud providers are supported? AWS is the first cloud provider supported by Karpenter, although it is designed to be used with other cloud providers as well. ### Can I write my own cloud provider for Karpenter? -Yes, but there is no documentation yet for it. -Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree{{< githubRelRef >}}pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. +Yes, but there is no documentation yet for it. Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree{{< githubRelRef >}}pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. ### What operating system nodes does Karpenter deploy? By default, Karpenter uses Amazon Linux 2 images. ### Can I provide my own custom operating system images? -Karpenter has multiple mechanisms for configuring the [operating system]({{< ref "./concepts/nodeclasses/#spec-amiselector" >}}) for your nodes. +Karpenter has multiple mechanisms for configuring the [operating system]({{< ref "./concepts/nodeclasses/#specamiselectorterms" >}}) for your nodes. ### Can Karpenter deal with workloads for mixed architecture cluster (arm vs. amd)? -Karpenter is flexible to multi architecture configurations using [well known labels]({{< ref "./concepts/scheduling/#supported-labels">}}). +Karpenter is flexible to multi-architecture configurations using [well known labels]({{< ref "./concepts/scheduling/#supported-labels">}}). ### What RBAC access is required? -All of the required RBAC rules can be found in the helm chart template. -See [clusterrolebinding.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/clusterrolebinding.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/role.yaml) files for details. +All the required RBAC rules can be found in the helm chart template. See [clusterrole-core.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/clusterrole-core.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob{{< githubRelRef >}}charts/karpenter/templates/role.yaml) files for details. ### Can I run Karpenter outside of a Kubernetes cluster? Yes, as long as the controller has network and IAM/RBAC access to the Kubernetes API and your provider API. @@ -39,63 +37,46 @@ Yes, as long as the controller has network and IAM/RBAC access to the Kubernetes See the [Compatibility Matrix in the Upgrade Guide]({{< ref "./upgrade-guide#compatibility-matrix" >}}) to view the supported Kubernetes versions per Karpenter released version. ### What Kubernetes distributions are supported? -Karpenter documents integration with a fresh or existing install of the latest AWS Elastic Kubernetes Service (EKS). -Other Kubernetes distributions (KOPs, etc.) can be used, but setting up cloud provider permissions for those distributions has not been documented. +Karpenter documents integration with a fresh or existing installation of the latest AWS Elastic Kubernetes Service (EKS). Other Kubernetes distributions (KOPs, etc.) can be used, but setting up cloud provider permissions for those distributions has not been documented. ### How does Karpenter interact with AWS node group features? -Provisioners are designed to work alongside static capacity management solutions like EKS Managed Node Groups and EC2 Auto Scaling Groups. -You can manage all capacity using provisioners, use a mixed model with dynamic and statically managed capacity, or use a fully static approach. -We expect most users will use a mixed approach in the near term and provisioner-managed in the long term. +NodePools are designed to work alongside static capacity management solutions like EKS Managed Node Groups and EC2 Auto Scaling Groups. You can manage all capacity using NodePools, use a mixed model with dynamic and statically managed capacity, or use a fully static approach. We expect most users will use a mixed approach in the near term and NodePool-managed in the long term. ### How does Karpenter interact with Kubernetes features? -* Kubernetes Cluster Autoscaler: Karpenter can work alongside cluster autoscaler. -See [Kubernetes cluster autoscaler]({{< ref "./concepts/#kubernetes-cluster-autoscaler" >}}) for details. -* Kubernetes Scheduler: Karpenter focuses on scheduling pods that the Kubernetes scheduler has marked as unschedulable. -See [Scheduling]({{< ref "./concepts/scheduling" >}}) for details on how Karpenter interacts with the Kubernetes scheduler. +* Kubernetes Cluster Autoscaler: Karpenter can work alongside Cluster Autoscaler. See [Kubernetes Cluster Autoscaler]({{< ref "./concepts/#kubernetes-cluster-autoscaler" >}}) for details. +* Kubernetes Scheduler: Karpenter focuses on scheduling pods that the Kubernetes scheduler has marked as unschedulable. See [Scheduling]({{< ref "./concepts/scheduling" >}}) for details on how Karpenter interacts with the Kubernetes scheduler. ## Provisioning -### What features does the Karpenter provisioner support? -See [Provisioner API]({{< ref "./concepts/nodepools" >}}) for provisioner examples and descriptions of features. +### What features does the Karpenter NodePool support? +See the [NodePool API docs]({{< ref "./concepts/nodepools" >}}) for NodePool examples and descriptions of features. -### Can I create multiple (team-based) provisioners on a cluster? -Yes, provisioners can identify multiple teams based on labels. -See [Provisioner API]({{< ref "./concepts/nodepools" >}}) for details. +### Can I create multiple (team-based) NodePools on a cluster? +Yes, NodePools can identify multiple teams based on labels. See the [NodePool API docs]({{< ref "./concepts/nodepools" >}}) for details. -### If multiple provisioners are defined, which will my pod use? +### If multiple NodePools are defined, which will my pod use? -Pending pods will be handled by any Provisioner that matches the requirements of the pod. -There is no ordering guarantee if multiple provisioners match pod requirements. -We recommend that Provisioners are setup to be mutually exclusive. -Read more about this recommendation in the [EKS Best Practices Guide for Karpenter](https://aws.github.io/aws-eks-best-practices/karpenter/#create-provisioners-that-are-mutually-exclusive). -To select a specific provisioner, use the node selector `karpenter.sh/provisioner-name: my-provisioner`. +Pending pods will be handled by any NodePools that matches the requirements of the pod. There is no ordering guarantee if multiple NodePools match pod requirements. We recommend that NodePools are set-up to be mutually exclusive. To select a specific NodePool, use the node selector `karpenter.sh/nodepool: my-nodepool`. ### How can I configure Karpenter to only provision pods for a particular namespace? -There is no native support for namespaced based provisioning. -Karpenter can be configured to provision a subset of pods based on a combination of taints/tolerations and node selectors. -This allows Karpenter to work in concert with the `kube-scheduler` in that the same mechanisms that `kube-scheduler` uses to determine if a pod can schedule to an existing node are also used for provisioning new nodes. -This avoids scenarios where pods are bound to nodes that were provisioned by Karpenter which Karpenter would not have bound itself. -If this were to occur, a node could remain non-empty and have its lifetime extended due to a pod that wouldn't have caused the node to be provisioned had the pod been unschedulable. +There is no native support for namespaced-based provisioning. Karpenter can be configured to provision a subset of pods based on a combination of taints/tolerations and node selectors. This allows Karpenter to work in concert with the `kube-scheduler` using the same mechanisms to determine if a pod can schedule to an existing node are also used for provisioning new nodes. This avoids scenarios where pods are bound to nodes that were provisioned by Karpenter which Karpenter would not have bound itself. If this were to occur, a node could remain non-empty and have its lifetime extended due to a pod that wouldn't have caused the node to be provisioned had the pod been unschedulable. -We recommend using Kubernetes native scheduling constraints to achieve namespace based scheduling segregation. Using native scheduling constraints ensures that Karpenter, `kube-scheduler` and any other scheduling or auto-provisioning mechanism all have an identical understanding of which pods can be scheduled on which nodes. This can be enforced via policy agents, an example of which can be seen [here](https://blog.mikesir87.io/2022/01/creating-tenant-node-pools-with-karpenter/). +We recommend using Kubernetes native scheduling constraints to achieve namespace-based scheduling segregation. Using native scheduling constraints ensures that Karpenter, `kube-scheduler` and any other scheduling or auto-provisioning mechanism all have an identical understanding of which pods can be scheduled on which nodes. This can be enforced via policy agents, an example of which can be seen [here](https://blog.mikesir87.io/2022/01/creating-tenant-node-pools-with-karpenter/). -### Can I add SSH keys to a provisioner? +### Can I add SSH keys to a NodePool? -Karpenter does not offer a way to add SSH keys via provisioners or secrets to the nodes it manages. -However, you can use Session Manager (SSM) or EC2 Instance Connect to gain shell access to Karpenter nodes. -See [Node NotReady]({{< ref "./troubleshooting/#node-notready" >}}) troubleshooting for an example of starting an SSM session from the command line or [EC2 Instance Connect](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-set-up.html) documentation to connect to nodes using SSH. +Karpenter does not offer a way to add SSH keys via NodePools or secrets to the nodes it manages. +However, you can use Session Manager (SSM) or EC2 Instance Connect to gain shell access to Karpenter nodes. See [Node NotReady]({{< ref "./troubleshooting/#node-notready" >}}) troubleshooting for an example of starting an SSM session from the command line or [EC2 Instance Connect](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-set-up.html) documentation to connect to nodes using SSH. -Though not recommended, if you need to access Karpenter-managed nodes without AWS credentials, you can add SSH keys using AWSNodeTemplate. -See [Custom User Data]({{< ref "./concepts/nodeclasses/#spec-userdata" >}}) for details. +Though not recommended, if you need to access Karpenter-managed nodes without AWS credentials, you can add SSH keys using EC2NodeClass User Data. See the [User Data section in the EC2NodeClass documentation]({{< ref "./concepts/nodeclasses/#specuserdata" >}}) for details. -### Can I set total limits of CPU and memory for a provisioner? -Yes, the setting is provider-specific. -See examples in [Accelerators, GPU]({{< ref "./concepts/scheduling/#accelerators-gpu-resources" >}}) Karpenter documentation. +### Can I set limits of CPU and memory for a NodePool? +Yes. View the [NodePool API docs]({{< ref "./concepts/nodepools#speclimits" >}}) for NodePool examples and descriptions of how to configure limits. ### Can I mix spot and on-demand EC2 run types? -Yes, see [Provisioning]({{< ref "./concepts/nodepools#examples" >}}) for an example. +Yes, see the [NodePool API docs]({{< ref "./concepts/nodepools#examples" >}}) for an example. ### Can I restrict EC2 instance types? @@ -104,53 +85,48 @@ Yes, see [Provisioning]({{< ref "./concepts/nodepools#examples" >}}) for an exam ### Can I use Bare Metal instance types? -Yes, Karpenter supports provisioning metal instance types when a Provisioner's `node.kubernetes.io/instance-type` Requirements only include `metal` instance types. If other instance types fulfill pod requirements, then Karpenter will prioritize all non-metal instance types before metal ones are provisioned. +Yes, Karpenter supports provisioning metal instance types when a NodePool's `node.kubernetes.io/instance-type` Requirements only include `metal` instance types. If other instance types fulfill pod requirements, then Karpenter will prioritize all non-metal instance types before metal ones are provisioned. ### How does Karpenter dynamically select instance types? -Karpenter batches pending pods and then binpacks them based on CPU, memory, and GPUs required, taking into account node overhead, VPC CNI resources required, and daemonsets that will be packed when bringing up a new node. -By default Karpenter uses C, M, and R >= Gen 3 instance types, but it can be constrained in the provisioner spec with the [instance-type](https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesioinstance-type) well-known label in the requirements section. -After the pods are binpacked on the most efficient instance type (i.e. the smallest instance type that can fit the pod batch), Karpenter takes 59 other instance types that are larger than the most efficient packing, and passes all 60 instance type options to an API called Amazon EC2 Fleet. -The EC2 fleet API attempts to provision the instance type based on an allocation strategy. -If you are using the on-demand capacity type, then Karpenter uses the `lowest-price` allocation strategy. -So fleet will provision the lowest priced instance type it can get from the 60 instance types Karpenter passed to the EC2 fleet API. -If the instance type is unavailable for some reason, then fleet will move on to the next cheapest instance type. -If you are using the spot capacity type, Karpenter uses the price-capacity-optimized allocation strategy. This tells fleet to find the instance type that EC2 has the most capacity for while also considering price. This allocation strategy will balance cost and decrease the probability of a spot interruption happening in the near term. -See [Choose the appropriate allocation strategy](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet-allocation-strategy.html#ec2-fleet-allocation-use-cases) for information on fleet optimization. +Karpenter batches pending pods and then binpacks them based on CPU, memory, and GPUs required, taking into account node overhead, VPC CNI resources required, and daemonsets that will be packed when bringing up a new node. Karpenter [recommends the use of C, M, and R >= Gen 3 instance types]({{< ref "./concepts/nodepools#spectemplatespecrequirements" >}}) for most generic workloads, but it can be constrained in the NodePool spec with the [instance-type](https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesioinstance-type) well-known label in the requirements section. + +After the pods are binpacked on the most efficient instance type (i.e. the smallest instance type that can fit the pod batch), Karpenter takes 59 other instance types that are larger than the most efficient packing, and passes all 60 instance type options to an API called Amazon EC2 Fleet. + + +The EC2 fleet API attempts to provision the instance type based on the [Price Capacity Optimized allocation strategy](https://aws.amazon.com/blogs/compute/introducing-price-capacity-optimized-allocation-strategy-for-ec2-spot-instances/). For the on-demand capacity type, this is effectively equivalent to the `lowest-price` allocation strategy. For the spot capacity type, Fleet will determine an instance type that has both the lowest price combined with the lowest chance of being interrupted. Note that this may not give you the instance type with the strictly lowest price for spot. ### How does Karpenter calculate the resource usage of Daemonsets when simulating scheduling? -Karpenter currently calculates the applicable daemonsets at the provisioner level with label selectors/taints, etc. It does not look to see if there are requirements on the daemonsets that would exclude it from running on particular instances that the provisioner could or couldn't launch. -The recommendation for now is to use multiple provisioners with taints/tolerations or label selectors to limit daemonsets to only nodes launched from specific provisioners. +Karpenter currently calculates the applicable daemonsets at the NodePool level with label selectors/taints, etc. It does not look to see if there are requirements on the daemonsets that would exclude it from running on particular instances that the NodePool could or couldn't launch. +The recommendation for now is to use multiple NodePools with taints/tolerations or label selectors to limit daemonsets to only nodes launched from specific NodePoools. ### What if there is no Spot capacity? Will Karpenter use On-Demand? -The best defense against running out of Spot capacity is to allow Karpenter to provision as many different instance types as possible. -Even instance types that have higher specs, e.g. vCPU, memory, etc., than what you need can still be cheaper in the Spot market than using On-Demand instances. -When Spot capacity is constrained, On-Demand capacity can also be constrained since Spot is fundamentally spare On-Demand capacity. -Allowing Karpenter to provision nodes from a large, diverse set of instance types will help you to stay on Spot longer and lower your costs due to Spot’s discounted pricing. -Moreover, if Spot capacity becomes constrained, this diversity will also increase the chances that you’ll be able to continue to launch On-Demand capacity for your workloads. +The best defense against running out of Spot capacity is to allow Karpenter to provision as many distinct instance types as possible. Even instance types that have higher specs (e.g. vCPU, memory, etc.) than what you need can still be cheaper in the Spot market than using On-Demand instances. When Spot capacity is constrained, On-Demand capacity can also be constrained since Spot is fundamentally spare On-Demand capacity. -If your Karpenter Provisioner specifies flexibility to both Spot and On-Demand capacity, Karpenter will attempt to provision On-Demand capacity if there is no Spot capacity available. -However, it’s strongly recommended that you specify at least 20 instance types in your Provisioner (or none and allow Karpenter to pick the best instance types) as our research indicates that this additional diversity increases the chances that your workloads will not need to launch On-Demand capacity at all. -Today, Karpenter will warn you if the number of instances in your Provisioner isn’t sufficiently diverse. +Allowing Karpenter to provision nodes from a large, diverse set of instance types will help you to stay on Spot longer and lower your costs due to Spot’s discounted pricing. Moreover, if Spot capacity becomes constrained, this diversity will also increase the chances that you’ll be able to continue to launch On-Demand capacity for your workloads. -Technically, Karpenter has a concept of an “offering” for each instance type, which is a combination of zone and capacity type (equivalent in the AWS cloud provider to an EC2 purchase option – Spot or On-Demand). -Whenever the Fleet API returns an insufficient capacity error for Spot instances, those particular offerings are temporarily removed from consideration (across the entire provisioner) so that Karpenter can make forward progress with different options. +If your Karpenter NodePool specifies allows both Spot and On-Demand capacity, Karpenter will fallback to provision On-Demand capacity if there is no Spot capacity available. However, it’s strongly recommended that you allow at least 20 instance types in your NodePool since this additional diversity increases the chances that your workloads will not need to launch On-Demand capacity at all. + +Karpenter has a concept of an “offering” for each instance type, which is a combination of zone and capacity type. Whenever the Fleet API returns an insufficient capacity error for Spot instances, those particular offerings are temporarily removed from consideration (across the entire NodePool) so that Karpenter can make forward progress with different options. ### Does Karpenter support IPv6? -Yes! Karpenter dynamically discovers if you are running in an IPv6 cluster by checking the kube-dns service's cluster-ip. When using an AMI Family such as `AL2`, Karpenter will automatically configure the EKS Bootstrap script for IPv6. Some EC2 instance types do not support IPv6 and the Amazon VPC CNI only supports instance types that run on the Nitro hypervisor. It's best to add a requirement to your Provisioner to only allow Nitro instance types: +Yes! Karpenter dynamically discovers if you are running in an IPv6 cluster by checking the kube-dns service's cluster-ip. When using an AMI Family such as `AL2`, Karpenter will automatically configure the EKS Bootstrap script for IPv6. Some EC2 instance types do not support IPv6 and the Amazon VPC CNI only supports instance types that run on the Nitro hypervisor. It's best to add a requirement to your NodePool to only allow Nitro instance types: ``` -kind: Provisioner +apiVersion: karpenter.sh/v1beta1 +kind: NodePool ... spec: - requirements: - - key: karpenter.k8s.aws/instance-hypervisor - operator: In - values: - - nitro + template: + spec: + requirements: + - key: karpenter.k8s.aws/instance-hypervisor + operator: In + values: + - nitro ``` For more documentation on enabling IPv6 with the Amazon VPC CNI, see the [docs](https://docs.aws.amazon.com/eks/latest/userguide/cni-ipv6.html). @@ -165,7 +141,7 @@ Windows nodes do not support IPv6. `kube-scheduler` is responsible for the scheduling of pods, while Karpenter launches the capacity. When using any sort of preferred scheduling constraint, `kube-scheduler` will schedule pods to nodes anytime it is possible. -As an example, suppose you scale up a deployment with a preferred zonal topology spread and none of the newly created pods can run on your existing cluster. Karpenter will then launch multiple nodes to satisfy that preference. If a) one of the nodes becomes ready slightly faster than other nodes and b) has enough capacity for multiple pods, `kube-scheduler` will schedule as many pods as possible to the single ready node so they won't remain unschedulable. It doesn't consider the in-flight capacity that will be ready in a few seconds. If all of the pods fit on the single node, the remaining nodes that Karpenter has launched aren't needed when they become ready and consolidation will delete them. +As an example, suppose you scale up a deployment with a preferred zonal topology spread and none of the newly created pods can run on your existing cluster. Karpenter will then launch multiple nodes to satisfy that preference. If a) one of the nodes becomes ready slightly faster than other nodes and b) has enough capacity for multiple pods, `kube-scheduler` will schedule as many pods as possible to the single ready node, so they won't remain unschedulable. It doesn't consider the in-flight capacity that will be ready in a few seconds. If all the pods fit on the single node, the remaining nodes that Karpenter has launched aren't needed when they become ready and consolidation will delete them. ### When deploying an additional DaemonSet to my cluster, why does Karpenter not scale-up my nodes to support the extra DaemonSet? @@ -194,7 +170,7 @@ See [Application developer]({{< ref "./concepts/#application-developer" >}}) for Yes. See [Persistent Volume Topology]({{< ref "./concepts/scheduling#persistent-volume-topology" >}}) for details. ### Can I set `--max-pods` on my nodes? -Yes, see the [KubeletConfiguration Section in the Provisioners Documentation]({{}}) to learn more. +Yes, see the [KubeletConfiguration Section in the NodePool docs]({{}}) to learn more. ### Why do the Windows2019 and Windows2022 AMI families only support Windows Server Core? The difference between the Core and Full variants is that Core is a minimal OS with less components and no graphic user interface (GUI) or desktop experience. @@ -223,19 +199,17 @@ Next, locate `KarpenterController IAM Role` ARN (i.e., ARN of the resource creat For information on upgrading Karpenter, see the [Upgrade Guide]({{< ref "./upgrade-guide/" >}}). -### Why do I get an `unknown field "startupTaints"` error when creating a provisioner with startupTaints? - -```bash -error: error validating "provisioner.yaml": error validating data: ValidationError(Provisioner.spec): unknown field "startupTaints" in sh.karpenter.v1alpha5.Provisioner.spec; if you choose to ignore these errors, turn validation off with --validate=false -``` - -The `startupTaints` parameter was added in v0.10.0. Helm upgrades do not upgrade the CRD describing the provisioner, so it must be done manually. For specific details, see the [Upgrade Guide]({{< ref "./upgrade-guide/#upgrading-to-v0100" >}}) - ## Upgrading Kubernetes Cluster ### How do I upgrade an EKS Cluster with Karpenter? -When upgrading an Amazon EKS cluster, [Karpenter's Drift feature]({{}}) can automatically upgrade the Karpenter-provisioned nodes to stay in-sync with the EKS control plane. Karpenter Drift currently needs to be enabled using a [feature gate]({{}}). Karpenter's default [AWSNodeTemplate `amiFamily` configuration]({{}}) uses the latest EKS Optimized AL2 AMI for the same major and minor version as the EKS cluster's control plane. Karpenter's AWSNodeTemplate can be configured to not use the EKS optimized AL2 AMI in favor of a custom AMI by configuring the [`amiSelector`]({{}}). If using a custom AMI, you will need to trigger the rollout of this new worker node image through the publication of a new AMI with tags matching the [`amiSelector`]({{}}), or a change to the [`amiSelector`]({{}}) field. +When upgrading an Amazon EKS cluster, [Karpenter's Drift feature]({{}}) can automatically upgrade the Karpenter-provisioned nodes to stay in-sync with the EKS control plane. Karpenter Drift currently needs to be enabled using a [feature gate]({{}}). + +{{% alert title="Note" color="primary" %}} +Karpenter's default [EC2NodeClass `amiFamily` configuration]({{}}) uses the latest EKS Optimized AL2 AMI for the same major and minor version as the EKS cluster's control plane, meaning that an upgrade of the control plane will cause Karpenter to auto-discover the new AMIs for that version. + +If using a custom AMI, you will need to trigger the rollout of this new worker node image through the publication of a new AMI with tags matching the [`amiSelector`]({{}}), or a change to the [`amiSelector`]({{}}) field. +{{% /alert %}} Start by [upgrading the EKS Cluster control plane](https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html). After the EKS Cluster upgrade completes, Karpenter's Drift feature will detect that the Karpenter-provisioned nodes are using EKS Optimized AMIs for the previous cluster version, and [automatically cordon, drain, and replace those nodes]({{}}). To support pods moving to new nodes, follow Kubernetes best practices by setting appropriate pod [Resource Quotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/), and using [Pod Disruption Budgets](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) (PDB). Karpenter's Drift feature will spin up replacement nodes based on the pod resource requests, and will respect the PDBs when deprovisioning nodes. @@ -249,7 +223,7 @@ Karpenter's native interruption handling offers two main benefits over the stand 1. You don't have to manage and maintain a separate component to exclusively handle interruption events. 2. Karpenter's native interruption handling coordinates with other deprovisioning so that consolidation, expiration, etc. can be aware of interruption events and vice-versa. -### Why am I receiving QueueNotFound errors when I set `aws.interruptionQueueName`? +### Why am I receiving QueueNotFound errors when I set `--interruption-queue-name`? Karpenter requires a queue to exist that receives event messages from EC2 and health services in order to handle interruption messages properly for nodes. Details on the types of events that Karpenter handles can be found in the [Interruption Handling Docs]({{< ref "./concepts/disruption/#interruption" >}}).