Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor apis.provisioning requirements #1155

Merged
merged 9 commits into from
Feb 4, 2022

Conversation

felix-zhe-huang
Copy link
Contributor

@felix-zhe-huang felix-zhe-huang commented Jan 15, 2022

1. Issue, if available:
Karpenter relies on the provisioner API to calculate the requirements and constraints for pod scheduling and resource provisioning. The current implementation logic only handles specify use cases and causes wrong scheduling and resource estimation errors for general use cases (Noticeably issue #1084 that daemonSet pods are not considered during resource provisioning). This patch corrects the requirement logics and relaxes the inflexibility of the current design. It will enable the development of the following features.

  • Improve resource utilization by grouping pods according to requirement compatibility
  • Support arbitrary requirement labels for topology and affinity features
  • Enable more use cases with Exists and DoesNotExist operators

Resolve #1084

2. Description of changes:

  • Refactor the requirement implementation.
  • Support Exists and DoesNotExists operator
  • Enable constraints compatibility testing
  • Correct constraint validation bugs

3. How was this change tested?
Included unit tests that test the new features.
Manually tested:

  • Provisioner requirement with NodeSelector works
  • Label with NodeSelector works
  • Scaling to 100 pods works
  • Provisioner zone requirement with Pod Affinity rules works
  • Provisioner selecting instance type works
  • Pods with In, NotIn, Exists and DoesNotExist Affinity rules works

4. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Jan 15, 2022

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 99d3eb0

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61fd8d9d6423290008b9a2d9

@felix-zhe-huang felix-zhe-huang force-pushed the refactor_requirements branch 3 times, most recently from 4354ddf to 784072b Compare January 18, 2022 16:52
@felix-zhe-huang felix-zhe-huang changed the title Refactor apis.provisioning requirements [WIP] Refactor apis.provisioning requirements Jan 23, 2022
pkg/utils/sets/sets.go Outdated Show resolved Hide resolved
@felix-zhe-huang felix-zhe-huang force-pushed the refactor_requirements branch 3 times, most recently from 636be99 to bb0356f Compare February 3, 2022 03:25
@ellistarn
Copy link
Contributor

Interesting CI errors:

Message: "Provisioner.karpenter.sh "condorpaint" is invalid: spec.requirements: Invalid value: "null": spec.requirements in body must be of type array: "null"",

@felix-zhe-huang
Copy link
Contributor Author

felix-zhe-huang commented Feb 3, 2022

Interesting CI errors:

This is indeed a very interesting case caused by the init problem we discussed about. Turns out that when a constraints object is created without properly initializing the requirements, since requirements is not nil (no longer a pointer), it will not be ignored (tag ignoreempty no longer work). So when the constraints object is serialized, the requirements.Requirements is a nil. That's causing the problem. I managed to patch this by init the requirments.Requirments in the requriements MarshalJSON function. I am hesitate to call this a fix because it may happen if other functions are called.

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Worth getting some eyes from @bwagner5. I'd also like to hear about some manual testing. Can you do thorough manual testing against a real cluster and update the description.

@felix-zhe-huang
Copy link
Contributor Author

Can you do thorough manual testing against a real cluster and update the description.

Did a brunch of manual tests and it works fabulously. Please see the PR descriptions about the manual tests performed.

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Huge change! The only critical thing to address is the IgnoredLabels since this will break the EBS in-tree driver.

pkg/utils/sets/sets.go Outdated Show resolved Hide resolved
pkg/utils/sets/sets.go Show resolved Hide resolved
}
// The legal operators for pod affinity and anti-affinity are In, NotIn, Exists, DoesNotExist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is inaccurate. This is not pod affinity. This is node affinity. There is no node antiaffinity. I think you can get away with not having any comment at all.

Suggested change
// The legal operators for pod affinity and anti-affinity are In, NotIn, Exists, DoesNotExist.

@@ -32,10 +31,10 @@ func (c *Constraints) defaultCapacityTypes() {
if _, ok := c.Labels[v1alpha5.LabelCapacityType]; ok {
return
}
if functional.ContainsString(c.Requirements.Keys(), v1alpha5.LabelCapacityType) {
if c.Requirements.Keys().Has(v1alpha5.LabelCapacityType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use c.Requirements.Get(v1alpha5.LabelCapacityType).Len()?

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! nice job!

@ellistarn ellistarn merged commit 623423b into aws:main Feb 4, 2022
@cdenneen
Copy link

cdenneen commented Feb 4, 2022

Can't wait to try it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Constraint validation incorrectly excluding daemonset pods in provisioning
4 participants