Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: [RFC] Add v1beta1 API RFC #4274

Merged
merged 8 commits into from Sep 15, 2023

Conversation

jonathan-innis
Copy link
Contributor

@jonathan-innis jonathan-innis commented Jul 17, 2023

Fixes #N/A

Description

This PR proposes the v1beta1 APIs and full Change List into the design directory

How was this change tested?

Does this change impact docs?

  • [] Yes, PR includes docs updates
  • [] Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jonathan-innis jonathan-innis requested a review from a team as a code owner July 17, 2023 02:13
@netlify
Copy link

netlify bot commented Jul 17, 2023

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 8184ef1
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/65040439bc3ebd00082ced2f

@jonathan-innis jonathan-innis force-pushed the rfc-v1beta1-design branch 3 times, most recently from 238e4ee to 810bdfc Compare July 17, 2023 02:20
@jonathan-innis jonathan-innis changed the title docs: [RFC] Add v1beta1 RFC to designs directory docs: [RFC] Add v1beta1 API RFC Jul 17, 2023
@jonathan-innis jonathan-innis force-pushed the rfc-v1beta1-design branch 8 times, most recently from b6b3d21 to 4c9f48d Compare July 19, 2023 05:05
@jonathan-innis jonathan-innis force-pushed the rfc-v1beta1-design branch 2 times, most recently from a647e9a to fa8ccb7 Compare July 24, 2023 23:26
@jonathan-innis jonathan-innis added needs-design Design required needs-review PRs that are still going through the review process and removed needs-design Design required labels Jul 25, 2023
@jonathan-innis jonathan-innis force-pushed the rfc-v1beta1-design branch 2 times, most recently from ca8bfc2 to 06eb4fa Compare July 25, 2023 18:45
@jonathan-innis jonathan-innis force-pushed the rfc-v1beta1-design branch 3 times, most recently from ad4b2a2 to 2068978 Compare July 25, 2023 21:45
Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work on this!!!

designs/v1beta1-api.md Outdated Show resolved Hide resolved
designs/v1beta1-api.md Outdated Show resolved Hide resolved
designs/v1beta1-api.md Outdated Show resolved Hide resolved
designs/v1beta1-api.md Show resolved Hide resolved
designs/v1beta1-api.md Outdated Show resolved Hide resolved
designs/v1beta1-full-changelist.md Outdated Show resolved Hide resolved
designs/v1beta1-full-changelist.md Outdated Show resolved Hide resolved
designs/v1beta1-full-changelist.md Outdated Show resolved Hide resolved
designs/v1beta1-full-changelist.md Outdated Show resolved Hide resolved
designs/v1beta1-api.md Show resolved Hide resolved
jonathan-innis and others added 2 commits July 30, 2023 22:31
Co-authored-by: Tim Bannister <tim@scalefactory.com>
designs/v1beta1-api.md Outdated Show resolved Hide resolved
designs/v1beta1-full-changelist.md Outdated Show resolved Hide resolved
designs/v1beta1-full-changelist.md Outdated Show resolved Hide resolved
designs/v1beta1-full-changelist.md Outdated Show resolved Hide resolved
designs/v1beta1-api.md Show resolved Hide resolved
@garvinp-stripe
Copy link
Contributor

garvinp-stripe commented Aug 22, 2023

Great work! I had some time to think, this isn't necessarily a strong opinion but I think there is some value in considering. I don't know what the timeline looks like and maybe what I am thinking doesn't make sense at v1beta1 however I think there is an opportunity to drive some usability improvement since there is going to be significant work in changing the custom resources.

I wanted to see if we can take some inspiration from https://gateway-api.sigs.k8s.io/ which is somewhat new and have been through quite a bit of changes in its overall design. Specifically the way it carved out custom resources that is able to abstract discrete configuration depending on personas. I think this way of thinking can yield quite a bit of insight into how to structure custom resources so that cluster operators can easily provide application developers the appropriate level of control.

Starting with NodePool. I think it has a lot of abstraction leaks between App dev vs Cluster Operator. Ideally app devs can specify their pods -> node, maybe some taints and label for their nodes (pretty much all the parameters in spec). However app dev shouldn't be able to specify kubelet configurations since this is usually managed by cluster operator. I think if possible this can be carved out to something like NodeProviders which are associated to NodePool. Similar to HTTPRoutes to Gateway in GatewayAPI. I think this would significantly improve the flexibility of how cluster operators manage Provisioners/ NodePool since often times there are much more app dev teams to cluster operator teams and most cluster operators are currently creating provisioners per app team.

I think something nice would be "version" similar to LaunchTemplate version. This allows Cluster Operators to rollout and rollback changes to nodes safely and have the necessary controls when managing many NodeClasses. Ideally each NodeClasses version should be immutable. Right now it is difficult to expand on NodeClass for nodes outside of AWS. It may be valuable to split NodeClasses, I don't have a good name, such that we have something that is at the cloud provider level and one that just defines nodes specification specific to a cloud provider. (Similar to gatewayclass and gateway in Gateway API). The Cloud provider layer resource defines the controller for creating nodes for that cloud provider and the nodeclass themselves work in the context of each cloud provider.

Thanks

@jonathan-innis
Copy link
Contributor Author

I think something nice would be "version" similar to LaunchTemplate version. This allows Cluster Operators to rollout and rollback changes to nodes safely and have the necessary controls when managing many NodeClasses. Ideally each NodeClasses version should be immutable. Right now it is difficult to expand on NodeClass for nodes outside of AWS. It may be valuable to split NodeClasses, I don't have a good name, such that we have something that is at the cloud provider level and one that just defines nodes specification specific to a cloud provider. (Similar to gatewayclass and gateway in Gateway API). The Cloud provider layer resource defines the controller for creating nodes for that cloud provider and the nodeclass themselves work in the context of each cloud provider.

I'm still thinking on the other comment that you made around the Gateway API, but, for this one, the idea was that the NodeClass is the cloudprovider extensibility mechanism; it's effectively the interface to define how Karpenter can actually bring up a Node in a given Cloudprovider. If another Cloudprovider came along and wanted to support Karpenter, I would expect that they would create their own NodeClass and this NodeClass would define its own interface with how to launch Nodes in that Cloudprovider.

Generally, we wanted to avoid creating too many disparate concepts that have to be individually reasoned about (i.e. more API kinds). We effectively have taken the stance that if something is common enough that it should be useful to all cloudproviders, then it should be hoisted up into the NodePool; otherwise, it belongs within the cloudprovider-specific NodeClass.

Is there a specific use-case that you have in mind where you think splitting out the details of the NodeClass could be useful beyond hoisting common cross-cloudprovider values into the NodePool?

@jonathan-innis jonathan-innis force-pushed the rfc-v1beta1-design branch 2 times, most recently from 20c3bdf to 4b8edd2 Compare September 14, 2023 22:42
@jonathan-innis jonathan-innis enabled auto-merge (squash) September 15, 2023 07:16
@jonathan-innis jonathan-innis merged commit 6ea6132 into aws:main Sep 15, 2023
15 checks passed

1. API Group Renames
1. `karpenter.k8s.aws` → `compute.k8s.aws`
Copy link

@tallaxes tallaxes Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, wondering what prompted the reversal ...

Comment on lines 320 to 355
#### `compute.k8s.aws`

1. `compute.k8s.aws/instance-hypervisor`
2. `compute.k8s.aws/instance-encryption-in-transit-supported`
3. `compute.k8s.aws/instance-category`
4. `compute.k8s.aws/instance-family`
5. `compute.k8s.aws/instance-generation`
6. `compute.k8s.aws/instance-local-nvme`
7. `compute.k8s.aws/instance-size`
8. `compute.k8s.aws/instance-cpu`
9. `compute.k8s.aws/instance-memory`
10. `compute.k8s.aws/instance-network-bandwidth`
11. `compute.k8s.aws/instance-pods`
12. `compute.k8s.aws/instance-gpu-name`
13. `compute.k8s.aws/instance-gpu-manufacturer`
14. `compute.k8s.aws/instance-gpu-count`
15. `compute.k8s.aws/instance-gpu-memory`
16. `compute.k8s.aws/instance-accelerator-name`
17. `compute.k8s.aws/instance-accelerator-manufacturer`
18. `compute.k8s.aws/instance-accelerator-count`
#### `karpenter.k8s.aws`

1. `karpenter.k8s.aws/instance-hypervisor`
2. `karpenter.k8s.aws/instance-encryption-in-transit-supported`
3. `karpenter.k8s.aws/instance-category`
4. `karpenter.k8s.aws/instance-family`
5. `karpenter.k8s.aws/instance-generation`
6. `karpenter.k8s.aws/instance-local-nvme`
7. `karpenter.k8s.aws/instance-size`
8. `karpenter.k8s.aws/instance-cpu`
9. `karpenter.k8s.aws/instance-memory`
10. `karpenter.k8s.aws/instance-network-bandwidth`
11. `karpenter.k8s.aws/instance-gpu-name`
12. `karpenter.k8s.aws/instance-gpu-manufacturer`
13. `karpenter.k8s.aws/instance-gpu-count`
14. `karpenter.k8s.aws/instance-gpu-memory`
15. `karpenter.k8s.aws/instance-accelerator-name`
16. `karpenter.k8s.aws/instance-accelerator-manufacturer`
17. `karpenter.k8s.aws/instance-accelerator-count`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-innis, I am curios what prompted the reversal of the original plan?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-review PRs that are still going through the review process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants