-
Notifications
You must be signed in to change notification settings - Fork 340
Introduce config opt-in NLB provisioning with Security Groups #1158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @mtulio. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost done. A few pending items in TODO, and raising questions in key points from a broad group.
pkg/providers/v1/aws.go
Outdated
if _, ok := tags["Name"]; !ok { | ||
tags["Name"] = name | ||
} | ||
// TODO/TBD do we need to introduce a managed tag to ensure SG wont be leaked if config is changed when LB is created in managed mode? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is important, it could cause leak if we dont introduce a webhook or mechanism to handle configuration changes.
Hey @kmala, I still have some items to review in this PR, but tests are passing locally. Would you mind stamping ok-to-test to validate if there is no local addiction from my env? Thanks |
/ok-to-test |
e1e47cf
to
1e8e527
Compare
/test all |
1e8e527
to
9a0b0cc
Compare
just starting to take a look at this. |
Observing in the last run:
/test pull-cloud-provider-aws-e2e |
I will rebase tough and try again. Quickly triage:
/test pull-cloud-provider-aws-e2e-kubetest2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggesting improvements in the documentation strings. I will keep this open for comments and apply it in the next review round.
e2e test are very flaky (getting stuck on go install) in other PRs, re-running to validate latest version here: /test pull-cloud-provider-aws-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me, i have some questions and suggestions.
docs/nlb_security_groups.md
Outdated
|
||
The controller can be configured to enable managed Security Groups (SGs) for AWS Network Load Balancers (NLBs) by setting an opt-in configuration in your cloud config. When enabled, each NLB created for a Kubernetes Service of type `LoadBalancer` will have a dedicated Security Group, managed by the cloud provider controller. We are calling this as opt-in Managed NLB Security Group ("Managed SG" mode). | ||
|
||
The following annotations (BYO SG / user-provided security groups) are **not supported** for NLB services and will result in service creation/update failure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why we call these out as "not supported", i'm guessing to set user expectations but it seemed a little confusing to me. just curious.
looking at the example below on how to engage this feature, i think if we are going to specify the annotations that do not work here, we should also specify the annotations that should be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, Mike. The initial idea of highlighting those BYO SG annotation of "not supported" was enhance user experience when trying to consume an valid annotation to the controller, but not implemented to NLB, only on CLB logic.
This UX pitfall is a very common use case users trying to use annotations valid only on CLB, and no feedback is provided/annotation ignored (afaict a webhook would help on that). Starting to this proposal, we are validating the BYO SG field when NLB to ensure users are not mixing concepts, getting an early feedback.
we should also specify the annotations that should be added.
good idea. To deploy Service type-loadBalancer NLB we should always set the annotation "service.beta.kubernetes.io/aws-load-balancer-type" to CCM. I will update it in a new paragraph above. Moreover, we can also mention the service_controller documentation, which was recently updated with a column Valid for
which mention annotation support by ELB type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me. in general, if we add something to the docs that says "do not use these things ....", we should also have something next to it that says "use these other things instead ....". or something similar, that's my desire here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added! I made a few changes from the original version regarding to this statement, focusing in redirecting to the supported service controller documentation which is complete with matrix of supported annotations.
I also added the note reinforcing the users the annotation used to provision NLB. LMK what it looks like now. Thanks
@@ -239,6 +239,7 @@ const ServiceAnnotationLoadBalancerSubnets = "service.beta.kubernetes.io/aws-loa | |||
|
|||
const headerSourceArn = "x-amz-source-arn" | |||
const headerSourceAccount = "x-amz-source-account" | |||
const securityGroupPrefix = "k8s-elb-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me
Thanks for review so far. FWIW I am returning on this PR today, I am applying the suggestions and synchronize the shared pieces (SG ownership check) with changes/feedback from PR #1209 |
Introduce the NLB Security Group Mode configuration (NLBSecurityGroupMode) to make the controller creates the Security Group by default when provisioning Service type-LoadBalancer NLB. This configuration is opt-in and global to the cluster.
Ensure the Security Group IDs is added on NLB load balancer creation. Additionally, this is fixing the BYO SG update scenario by detecting the replaced SG on CLB and delete it when it is owned by controller. The same behavior will be implemented in the BYO SG scenario for NLB too.
5f62c03
to
a1a785e
Compare
Ensure the Security Group is managed when creating a service type-LoadBalancer NLB object, considering the global configuration to manage SGs in NLBs: NLBSecurityGroupMode=NLBSecurityGroupModeManaged
Ensure unit tests on EnsureLoadBalancer, including case to test NLB with security group by changing the cloud-config.
Introduce hasClusterTagOwned() to validate if a resource has the kubernetes cluster tag (`kubernetes.io/cluster/clusterID`) with value `owned`, so it can quickly used when ensuring states to cloud resources managed by controller, such as SG deletions, etc.
Introduce the documentation to use the feature Service type-LoadBalancer with Security Group by opt-in through the cloud-config.
a1a785e
to
59722ba
Compare
Feedback addressed. The only pending question is related to the SG name, currently I am following the CCMs approach: SG name will have the same name of LB /test pull-cloud-provider-aws-e2e |
e2e are passing on CI, units are green locally. Converting this PR to ready. PTAL? |
Observing jobs getting stuck frequently on build, I need to rerun those frequently after many hours on Prow Console. Just did it again since the last run. |
Reviewers: there is a known-bug on kubelet killing the group and Prow is not caughting correctly, leaving the job running forever. Furthermore, I can see e2e job is consuming more CPU and Mem than limit, leading it to be killed by kube. More information to the Slack thread: https://kubernetes.slack.com/archives/C7J9RP96G/p1754511876402269?thread_ts=1754505741.634999&cid=C7J9RP96G I am constantly monitoring and restarting it manually to get readiness of the latest version. This is not blocking this PR for review. Thanks! |
CI must be fixed /test pull-cloud-provider-aws-e2e |
Perfect, now with CI fixes the tests are passing. cc @elmiko |
thanks @mtulio , i will take a look this week. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR introduces managed security group support for Service type-LoadBalancer NLB through a global cloud-config option. Key changes:
NLBSecurityGroupMode: Managed
cloud-config to automatically provision Security Groups for new NLBsservice.beta.kubernetes.io/aws-load-balancer-security-groups
andservice.beta.kubernetes.io/aws-load-balancer-extra-security-groups
annotations for NLBRationale for BYO SG Limitation:
BYO Security Group support for NLB is intentionally blocked in this release due to:
Which issue(s) this PR fixes:
Refs #1151
Special notes for your reviewer:
Future Work:
Managed SG is important to empower users to bypass managed SG, and enhance security control boundaries with user-managed security groups on NLBs. BYO Security Group support for NLB will be considered in a future release after:
The changes introducing the global, opt-in, cloud-config for enabling Security Group (SG) on NLB creation (similar CLB), is for users/administrators who intentionally wants to enforce SG across all new services - following AWS recommendations, and ALBC defaults. This won't change the default CCM behavior if the configuration isn't added.
Done checklist:
Related changes isolated from this PR:
Does this PR introduce a user-facing change?: