Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support controllers leader election mode #1753

Closed
migueleliasweb opened this issue Mar 24, 2023 · 20 comments
Closed

Support controllers leader election mode #1753

migueleliasweb opened this issue Mar 24, 2023 · 20 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@migueleliasweb
Copy link

migueleliasweb commented Mar 24, 2023

The Helm template seems to not allow such configuration. See: https://github.com/aws-controllers-k8s/iam-controller/blob/main/helm/templates/deployment.yaml#L14

Is there reason why wouldn't this controller work with multiple replicas?

@jaypipes
Copy link
Collaborator

You'd need to set up the leader election bits, but it should be possible. Is there a particular reason you want to run >1 replica? For performance or for availability reasons?

@migueleliasweb
Copy link
Author

migueleliasweb commented Mar 24, 2023

Hi @jaypipes once again.

Yeah, in my case it's the company policy for reliability.

I was looking through the code and the leader election code seems to be mostly there. It might be missing the key for the election and way to check whether we're running multiple replicas or not so the election is enabled or disabled correctly.

Would you be against implementing this? I could also give it a shot if that's okay.

Thanks for the swift reply, Jay!

@migueleliasweb
Copy link
Author

I've found there is a flag being set on https://github.com/aws-controllers-k8s/runtime/blob/main/pkg/config/config.go#L106 and all it takes is that flag to be set so it handles the leader election.

From what I can tell, all it takes to support HA is:

  • Enable replicas to be set in the helm chart via the values.yaml
  • Use the replicas value to check if we need to set the --enable-leader-election flag

Let me know what you think, @jaypipes . I will go ahead and create this PR so we have something to talk about =P.

@migueleliasweb
Copy link
Author

Done :)

@a-hilaly
Copy link
Member

Hi @migueleliasweb , those helm configurations files are generated frequently, so we'll have to make changes https://github.com/aws-controllers-k8s/code-generator/tree/main/templates/helm instead :)

Another thought is that i'm not sure whether passing --enable-leader-election flag works for now, we are probably missing some extra controller-runtime configurations. Check #1578

@migueleliasweb
Copy link
Author

Humm yeah I guess you're right...

I've read a bit more about the leader election and the configuration that is missing. The controller runtime
config should be fairly straightforward to add. What seems to be missing is just the key to be used to determine the election and possibly some rbac configuration to ensure the pods will have access to the configmap/endpoint that is used to orchestrate the election.

I wonder how could we make this change isolated to the ack IAM controller instead of enabling it at the same time for all controllers that pull code from that shared library.

@a-hilaly
Copy link
Member

@migueleliasweb All the controllers already import ack/runtime and k8s/controller-runtime - I we want to fully support this feature, we'll have to touch a bit ack/runtime and maybe code-generator ? if you have a prototype of leader election working i'm happy to give it a shot :)

@jaypipes jaypipes changed the title Can ack-aim-controller run with multiple replicas in a HA way? Can ACK controllers run with multiple replicas in a HA way? Mar 31, 2023
@jaypipes jaypipes added the question Clarification questions label Mar 31, 2023
@ack-bot
Copy link
Collaborator

ack-bot commented Jun 29, 2023

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale

@ack-prow ack-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2023
@jljaco
Copy link
Contributor

jljaco commented Jul 13, 2023

/lifecycle frozen

@ack-prow ack-prow bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 13, 2023
@migueleliasweb
Copy link
Author

https://github.com/kubernetes/client-go/blob/master/examples/leader-election/main.go

@a-hilaly
Copy link
Member

@migueleliasweb I have a patch that worked for lambda/dynamodb controllers aws-controllers-k8s/code-generator#455 - hoping to get it merged/released soon.

@a-hilaly
Copy link
Member

https://github.com/kubernetes/client-go/blob/master/examples/leader-election/main.go

Thank you for sharing this, it was helpful and interesting to read! controller-runtime hides a lot of the client-go options we only can configure those https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/leaderelection/leader_election.go#L36-L52

@migueleliasweb
Copy link
Author

migueleliasweb commented Jul 18, 2023

I'm the one to should thank you, @a-hilaly , for actioning this so quickly!

Btw, I reckon, under the hood the leader election relies on leases (https://kubernetes.io/docs/concepts/architecture/leases/) to get a "lock" and determine the leader. I don't know what Kubernetes version the ACK controllers aim to be deployable to but if the idea is to support the current EKS versions, this should be just fine ;).

@a-hilaly
Copy link
Member

Yes, the default one currently is Leases, and before it used to be configMaps - I believe leases have been around for a few years now? safe to way we support all the 1.2x versions?

@a-hilaly
Copy link
Member

a-hilaly commented Jul 18, 2023

Actually now thinking about it.. we will need leases read/write RBAC for controllers with this change

@a-hilaly a-hilaly self-assigned this Jul 18, 2023
@a-hilaly a-hilaly added kind/feature Categorizes issue or PR as related to a new feature. and removed question Clarification questions labels Jul 18, 2023
@a-hilaly a-hilaly changed the title Can ACK controllers run with multiple replicas in a HA way? Support controllers leader election mode Jul 18, 2023
@a-hilaly
Copy link
Member

Also adding a new flag to be able to configure leader election namespace aws-controllers-k8s/runtime#123

@migueleliasweb
Copy link
Author

/remove-lifecycle frozen

@ack-prow ack-prow bot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 22, 2023
ack-prow bot pushed a commit to aws-controllers-k8s/runtime that referenced this issue Jul 27, 2023
Fixes (part of) aws-controllers-k8s/community#1753

This patch introduces a new `--leader-election-namespace` flag used to
configure controller-runtime leaderElection componenet. This namespace
is utilized by the controller to manage the `coordination.k8s.io/lease`
obejct for leader election.

In the context of the controller-runtime library, if the
LeaderElectionNamespace parametere is not explicitly set, the library
will automatically default its value to the content of the file
mounted at /var/run/secrets/kubernetes.io/serviceaccount/namespace.

> https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/leaderelection/leader_election.go#L112-L127

In Kubernetes, when a pod is created, a service account is
automatically associated with it, unless explicitly specified
otherwise. This service account contains relevant information, such
as the namespace in which the pod is deployed. The Kubernetes API
server mounts a two files for the service account in the pod's
filesystem at /var/run/secrets/kubernetes.io/serviceaccount/token
and /var/run/secrets/kubernetes.io/serviceaccount/namespace,
respectively.

> https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/serviceaccount/tokens_controller.go#L399-L402

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
ack-prow bot pushed a commit to aws-controllers-k8s/code-generator that referenced this issue Sep 7, 2023
Issue: aws-controllers-k8s/community#1753 (comment)

ACK controllers use k8s-sigs/controller-runtime behind the scenes, which
support leader election. This feature is not properly working due to a
missing configuration `LeaderElectionNamespace` which is used by the
manager to create `k8s.io/coordination` Lease objects.

This patch sets the default `LeaderElectionNamespace` to `ack-system`
and adds the capability of enabling leader election using helm values.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Co-authored-by: Adam Cornett <adc@redhat.com>
@a-hilaly
Copy link
Member

@migueleliasweb Leader election is implemented and we are currently in the process of releasing a new version for all the controllers to include this feature.

@a-hilaly
Copy link
Member

released for all the controllers
/close

@ack-prow
Copy link

ack-prow bot commented Sep 30, 2023

@a-hilaly: Closing this issue.

In response to this:

released for all the controllers
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ack-prow ack-prow bot closed this as completed Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants