Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Clean leaked VPC-resource-controller ENIs #6158

Closed
wants to merge 1 commit into from

Conversation

engedaam
Copy link
Contributor

@engedaam engedaam commented May 6, 2024

Fixes #N/A

Description

  • Clean-up ENIs that are created and leaked by the vpc-resource-controller

How was this change tested?

  • N/A

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@engedaam engedaam requested a review from a team as a code owner May 6, 2024 13:55
@engedaam engedaam requested a review from bwagner5 May 6, 2024 13:55
Copy link

netlify bot commented May 6, 2024

Deploy Preview for karpenter-docs-prod ready!

Name Link
🔨 Latest commit 03d8c9c
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/6638e16885886b00081c9c31
😎 Deploy Preview https://deploy-preview-6158--karpenter-docs-prod.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 8970315796

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 82.266%

Totals Coverage Status
Change from base Build 8970204707: 0.02%
Covered Lines: 5409
Relevant Lines: 6575

💛 - Coveralls

Copy link
Contributor

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest thing: It wasn't clear to me who is leaking these resources and which tests are causing them to leak. I thought we already took care of the VPC CNI-provisioned ENIs so which tests are causing this behavior?

@@ -154,3 +160,59 @@ func (e *ENI) Cleanup(ctx context.Context, ids []string) ([]string, error) {

return deleted, errs
}

// get-vpc-resource-controller-enis is to get leaked ENIs by the vpc-resource-controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it created by the VPC resource controller or the VPC CNI?

Values: []string{fmt.Sprintf("kubernetes.io/cluster/%s", lo.FromPtr(clusterName.Value))},
},
{
Name: lo.ToPtr("tag:eks:eni:owner"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which testing did you see NICs with this tag get created under? AFAIK, NICs with this tag are only created when using security groups for pods and branch ENIs (which I don't think we do in our testing). I thought the tags that the VPC CNI added for its NICs were: node.k8s.amazonaws.com/instance_id and node.k8s.amazonaws.com/createdAt

return *tag.Key == karpenterTestingTag
})
if found {
if slices.Contains(excludedClusters, lo.FromPtr(clusterName.Value)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate: Do we have a TODO comment around these excluded clusters to ensure that we eventually move this to a tagging mechanism? Minimally, do we have something that's tracked on GH?

}

stacks := lo.Reject(out.Stacks, func(s cloudformationtypes.Stack, _ int) bool {
return s.StackStatus == cloudformationtypes.StackStatusDeleteComplete ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking for the stack first here? Not immediately apparent. If it's needed, should we add a comment?


// get-vpc-resource-controller-enis is to get leaked ENIs by the vpc-resource-controller
// Issue: https://github.com/aws/karpenter-provider-aws/issues/5582
func (e *ENI) getVpcResourceControllerENIs(ctx context.Context, expirationTime time.Time, excludedClusters []string) (ids []string, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these ENIs specifically tied to the VPC resource controller or are these just ENIs that were launched by Karpenter's nodes with the VPC CNI? Is there a reason that we need to clean these up differently than our primary ENIs? I believe the VPC CNI has config that allows you to add tags to these ENIs. We could just add tags and then re-leverage our existing ENI cleanup if we wanted.

@engedaam engedaam closed this May 12, 2024
@engedaam engedaam deleted the clean-leaked-vpc-cni-enis branch May 12, 2024 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants