Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS Cloud Provider Docs #557

Merged
merged 7 commits into from
Jul 30, 2021
Merged

Add AWS Cloud Provider Docs #557

merged 7 commits into from
Jul 30, 2021

Conversation

geoffcline
Copy link
Contributor

Description of changes:
Add AWS cloud provider page.
Add description of well known labels and corresponding behavior on AWS. For example, launch nodes in a particular availability zone.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@geoffcline geoffcline added the documentation Improvements or additions to documentation label Jul 27, 2021
@geoffcline geoffcline self-assigned this Jul 27, 2021
@geoffcline geoffcline added this to In progress in Documentation via automation Jul 27, 2021
@netlify
Copy link

netlify bot commented Jul 27, 2021

✔️ Deploy Preview for karpenter-docs-prod ready!

🔨 Explore the source changes: 9758057

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6104713f03b39b00082c96da

😎 Browse the preview: https://deploy-preview-557--karpenter-docs-prod.netlify.app/docs/cloud-providers/aws

@geoffcline
Copy link
Contributor Author


### AWS Region

`topology.kubernetes.io/region=us-east-1`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be = or :?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubernetes docs use =, and I merely assumed they had a good reason

https://kubernetes.io/docs/reference/labels-annotations-taints/

```yaml
spec:
instanceTypes:
- m5.large
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is actually fully vendor neutral, so it should probably exist in vendor neutral docs. It just happens that for AWS, the values accepted are aws specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep this here for now. The info in this section is largely AWS specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any label starting with node.k8s.aws is AWS specific. IIUC, we're using this one https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesioinstance-type

@geoffcline
Copy link
Contributor Author

thanks for the reviews all! I pushed an update and responded to some comments.

- [Node Labels](#node-lables-in-provisioner-spec) (e.g., Architecture, Capacity Type)
- [Pod Labels](#pod-labels) (e.g., GPU)

## Instance Type Allowlist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think allowlist is the wrong frame. Karpenter is built on a concepts of defaults (provisioner) + overrides (pod). A pod can specify a node selector for an instance type that is not in the provisioner's defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - if a pod specifies an instance type not in karpenter's instance type list, karpenter will still provision it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The provisioner controls a set of defaults, but if the pod has a specific opinion, it will override the instance type and we'll use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs to be documented.

`topology.kubernetes.io/zone=us-east-1c`, Karpenter will provision nodes in
that availability zone.

Regarding AWS, 3 types of provisioning constraints are recognized:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Regarding AWS" seems like an awkward way to start a sentence to me. Also -- specifying the number in the sentence will easily become stale as we add more things to this list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be resolved after we refactor the categories

that availability zone.

Regarding AWS, 3 types of provisioning constraints are recognized:
- [Instance Type](#instance-type-allowlist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Architecture and Operating System and Zone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instance type is specified as it's own list in the crd yaml, so I kept it separate.

"Node labels" are things that go under "labels:" in the provisioner crd.

I also separated the pod GPU stuff because it only makes sense to include a podspec and not the provisioner crd.

I'm very open to alternate ways to structure some page. I did struggle with trying to make it more than a random list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how I think about it:

Vendor neutral constraints (provisioner top level fields, pod.spec.nodeSelector)

  • instance type
  • zone
  • architecture
  • operating system

Vendor specific constraints (provisioner.spec.labels, pod.spec.nodeSelector)

  • [aws] capacity type
  • [aws] launch template
  • [aws] security groups
  • [aws] subnets

Regarding AWS, 3 types of provisioning constraints are recognized:
- [Instance Type](#instance-type-allowlist)
- [Node Labels](#node-lables-in-provisioner-spec) (e.g., Architecture, Capacity Type)
- [Pod Labels](#pod-labels) (e.g., GPU)
Copy link
Contributor

@ellistarn ellistarn Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pod label's piece isn't quite accurate. Specifically, these are pod.spec.nodeSelector. They're choosing node labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a good more general name? pod requirements?


Karpenter supports specifying [AWS instance type](https://aws.amazon.com/ec2/instance-types/).

If one instance type is listed, Karpenter will always provision that type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true -- see overrides above.


If one instance type is listed, Karpenter will always provision that type.

If more than one type is listed, Karpenter will intelligently select the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid use of the word "intelligently", as it implies something that it's not. It's just logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to "...karpenter will determine the instance type..."

If one instance type is listed, Karpenter will always provision that type.

If more than one type is listed, Karpenter will intelligently select the
instance type to maximize node utilization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will binpack pending pods into the smallest number of nodes allowed by scheduling constraints. This is to minimize node overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between minimizing node overhead and maximizing node utilization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a classical NP-Hard computer science problem to minimize node overhead. Minimizing the number of nodes (which is what we do currently) minimizes the node overhead, which tends to do better w.r.t overall utilization.

e.g.
Given 2 pods, 1 core + 35 cores = 36 cores total. It's NP-Hard to calculate that this should be two instances (c5.9xlarge + c5.large), and instead our logic will provision a c5.12xlarge, which is 8 additional "wasted" cores. However, pods that get created in the future will be able to schedule here, so in the limit of the cluster, it's likely this capacity will get used, but the node-level overhead (daemonsets, etc) will be reduced.

### Launch Template

Karpenter uses [AWS Bottlerocket OS](https://aws.amazon.com/bottlerocket/) by
default. More specifically, Karpenter automatically creates a launch template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid language like "More specifically". Just state what it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards keeping it.

What does it mean to use bottlerocket by default? It means we create a launch template for you with that OS.

with the name `Karpenter-<cluster name>-uuid` for each region where a node is
provisioned.

You can specify a different [launch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should state that customers should rely on Karpenter's default launch templates unless their use case requires something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to add this. I think the reader would understand when a different launch template is needed.

Is there an unexpected consequence of using your own launch template?


## Operating System

- key: `kubernetes.io/os`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first class field in the provisioner (since it's an upstream k8s concept), and a nodeSelector at the pod level (since everything is a nodeSelector for pods).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reluctant to fragment the docs further at this point. For this release (where we only support AWS), I would like to have a single page where users can ctrl-f for the different tags/values.


Karpenter supports accelerators, such as GPUs.

To specify a specific GPU type, use the [instance type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify that this should be done in addition to resources. Otherwise the kubelet won't allocate the GPU to the pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified this, but the new content needs further review. I misunderstood this previously.

@geoffcline
Copy link
Contributor Author

Note to self: revert desc of AZ IDs

@geoffcline
Copy link
Contributor Author

pushed a revision removing description of AZ IDs

@geoffcline geoffcline moved this from In progress to in review in Documentation Jul 30, 2021
@akestner akestner changed the title add description of AWS labels Add AWS Cloud Provider Docs Jul 30, 2021
Copy link
Contributor

@akestner akestner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked on this together

@ellistarn ellistarn merged commit 2519db5 into aws:main Jul 30, 2021
Documentation automation moved this from in review to Done Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants