-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AWS Cloud Provider Docs #557
Conversation
✔️ Deploy Preview for karpenter-docs-prod ready! 🔨 Explore the source changes: 9758057 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6104713f03b39b00082c96da 😎 Browse the preview: https://deploy-preview-557--karpenter-docs-prod.netlify.app/docs/cloud-providers/aws |
|
||
### AWS Region | ||
|
||
`topology.kubernetes.io/region=us-east-1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be =
or :
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kubernetes docs use =
, and I merely assumed they had a good reason
https://kubernetes.io/docs/reference/labels-annotations-taints/
```yaml | ||
spec: | ||
instanceTypes: | ||
- m5.large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is actually fully vendor neutral, so it should probably exist in vendor neutral docs. It just happens that for AWS, the values accepted are aws specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep this here for now. The info in this section is largely AWS specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any label starting with node.k8s.aws
is AWS specific. IIUC, we're using this one https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesioinstance-type
thanks for the reviews all! I pushed an update and responded to some comments. |
- [Node Labels](#node-lables-in-provisioner-spec) (e.g., Architecture, Capacity Type) | ||
- [Pod Labels](#pod-labels) (e.g., GPU) | ||
|
||
## Instance Type Allowlist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think allowlist is the wrong frame. Karpenter is built on a concepts of defaults (provisioner) + overrides (pod). A pod can specify a node selector for an instance type that is not in the provisioner's defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - if a pod specifies an instance type not in karpenter's instance type list, karpenter will still provision it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The provisioner controls a set of defaults, but if the pod has a specific opinion, it will override the instance type and we'll use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs to be documented.
`topology.kubernetes.io/zone=us-east-1c`, Karpenter will provision nodes in | ||
that availability zone. | ||
|
||
Regarding AWS, 3 types of provisioning constraints are recognized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Regarding AWS" seems like an awkward way to start a sentence to me. Also -- specifying the number in the sentence will easily become stale as we add more things to this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be resolved after we refactor the categories
that availability zone. | ||
|
||
Regarding AWS, 3 types of provisioning constraints are recognized: | ||
- [Instance Type](#instance-type-allowlist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Architecture and Operating System and Zone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instance type is specified as it's own list in the crd yaml, so I kept it separate.
"Node labels" are things that go under "labels:" in the provisioner crd.
I also separated the pod GPU stuff because it only makes sense to include a podspec and not the provisioner crd.
I'm very open to alternate ways to structure some page. I did struggle with trying to make it more than a random list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how I think about it:
Vendor neutral constraints (provisioner top level fields, pod.spec.nodeSelector)
- instance type
- zone
- architecture
- operating system
Vendor specific constraints (provisioner.spec.labels, pod.spec.nodeSelector)
- [aws] capacity type
- [aws] launch template
- [aws] security groups
- [aws] subnets
Regarding AWS, 3 types of provisioning constraints are recognized: | ||
- [Instance Type](#instance-type-allowlist) | ||
- [Node Labels](#node-lables-in-provisioner-spec) (e.g., Architecture, Capacity Type) | ||
- [Pod Labels](#pod-labels) (e.g., GPU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this pod label's piece isn't quite accurate. Specifically, these are pod.spec.nodeSelector. They're choosing node labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a good more general name? pod requirements?
|
||
Karpenter supports specifying [AWS instance type](https://aws.amazon.com/ec2/instance-types/). | ||
|
||
If one instance type is listed, Karpenter will always provision that type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true -- see overrides above.
|
||
If one instance type is listed, Karpenter will always provision that type. | ||
|
||
If more than one type is listed, Karpenter will intelligently select the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid use of the word "intelligently", as it implies something that it's not. It's just logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to "...karpenter will determine the instance type..."
If one instance type is listed, Karpenter will always provision that type. | ||
|
||
If more than one type is listed, Karpenter will intelligently select the | ||
instance type to maximize node utilization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will binpack pending pods into the smallest number of nodes allowed by scheduling constraints. This is to minimize node overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between minimizing node overhead and maximizing node utilization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a classical NP-Hard computer science problem to minimize node overhead. Minimizing the number of nodes (which is what we do currently) minimizes the node overhead, which tends to do better w.r.t overall utilization.
e.g.
Given 2 pods, 1 core + 35 cores = 36 cores total. It's NP-Hard to calculate that this should be two instances (c5.9xlarge + c5.large), and instead our logic will provision a c5.12xlarge, which is 8 additional "wasted" cores. However, pods that get created in the future will be able to schedule here, so in the limit of the cluster, it's likely this capacity will get used, but the node-level overhead (daemonsets, etc) will be reduced.
### Launch Template | ||
|
||
Karpenter uses [AWS Bottlerocket OS](https://aws.amazon.com/bottlerocket/) by | ||
default. More specifically, Karpenter automatically creates a launch template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid language like "More specifically". Just state what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards keeping it.
What does it mean to use bottlerocket by default? It means we create a launch template for you with that OS.
with the name `Karpenter-<cluster name>-uuid` for each region where a node is | ||
provisioned. | ||
|
||
You can specify a different [launch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should state that customers should rely on Karpenter's default launch templates unless their use case requires something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to add this. I think the reader would understand when a different launch template is needed.
Is there an unexpected consequence of using your own launch template?
|
||
## Operating System | ||
|
||
- key: `kubernetes.io/os` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first class field in the provisioner (since it's an upstream k8s concept), and a nodeSelector at the pod level (since everything is a nodeSelector for pods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reluctant to fragment the docs further at this point. For this release (where we only support AWS), I would like to have a single page where users can ctrl-f for the different tags/values.
|
||
Karpenter supports accelerators, such as GPUs. | ||
|
||
To specify a specific GPU type, use the [instance type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify that this should be done in addition to resources
. Otherwise the kubelet won't allocate the GPU to the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified this, but the new content needs further review. I misunderstood this previously.
Note to self: revert desc of AZ IDs |
pushed a revision removing description of AZ IDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked on this together
Description of changes:
Add AWS cloud provider page.
Add description of well known labels and corresponding behavior on AWS. For example, launch nodes in a particular availability zone.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.