Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Separate zone caching by subnet selectors #2229

Merged
merged 4 commits into from
Aug 2, 2022

Conversation

asimshankar
Copy link
Contributor

Fixes #2228

Description

The set of zones for instance types was using the same cache key irrespective of the subnet selector. As a result, if two provisioners had distinct subnetSelectors that matched subnets in different availability zones, the karpenter controller would only see one of them and be unable to schedule nodes on some of the subnets.

How was this change tested?

Manually tested by patching this change on top of the v0.13.2-release branch in our cluster.

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

Fix flakiness in setting up nodes in the right availability zone when provisioners target different subnets.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@asimshankar asimshankar requested a review from a team as a code owner July 31, 2022 21:24
@asimshankar asimshankar requested a review from tzneal July 31, 2022 21:24
@netlify
Copy link

netlify bot commented Jul 31, 2022

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 9c64f73
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/62e94d7365f6f100099fbb05

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! One minor recommendation

Co-authored-by: Ellis Tarn <ellistarn@gmail.com>
@asimshankar
Copy link
Contributor Author

Nice fix! One minor recommendation

Thanks for the quick turnaround! Applied the recommendation.

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Just one missing "

pkg/cloudprovider/aws/instancetypes.go Outdated Show resolved Hide resolved
@bwagner5
Copy link
Contributor

bwagner5 commented Aug 1, 2022

looks like "github.com/aws/karpenter/pkg/utils/pretty" needs to be imported as well.

@asimshankar
Copy link
Contributor Author

looks like "github.com/aws/karpenter/pkg/utils/pretty" needs to be imported as well.

Sorry for the bumbling around, fixed. Will keep an eye on any new silly compile errors.

Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@bwagner5 bwagner5 merged commit f459470 into aws:main Aug 2, 2022
@asimshankar asimshankar deleted the issue-2228 branch August 2, 2022 17:36
@avinashparchuri
Copy link

@bwagner5 would it be possible to get this into a new RC (or if there is a regular release coming up, that would work as well)? Thanks!

@bwagner5
Copy link
Contributor

bwagner5 commented Aug 2, 2022

We have snapshot builds, this includes this PR: public.ecr.aws/karpenter/controller:f4594705f9d20d3d1568fe5f989112dabbc51579

njtran pushed a commit to njtran/karpenter-provider-aws that referenced this pull request Aug 9, 2022
* [aws]: Separate zone caching by subnet selectors

* Update pkg/cloudprovider/aws/instancetypes.go

Co-authored-by: Ellis Tarn <ellistarn@gmail.com>

* Update pkg/cloudprovider/aws/instancetypes.go

* fix import

Co-authored-by: Ellis Tarn <ellistarn@gmail.com>
Co-authored-by: Brandon Wagner <bmwagner10@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aws] Failure/unpredictability in assigning nodes if provisioners target different subnets
4 participants