Skip to content

[KEP-5366] Graceful Leader Transition #5367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2025
Merged

Conversation

Jefftree
Copy link
Member

@Jefftree Jefftree commented Jun 3, 2025

  • One-line PR description: Allow leader elected components to transition to a follower state without restarting the process

/sig api-machinery
/assign @jpbetz

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jun 3, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 3, 2025
@Jefftree Jefftree force-pushed the glt branch 3 times, most recently from 166e6e3 to 1d2dae4 Compare June 3, 2025 21:10
@Jefftree
Copy link
Member Author

Jefftree commented Jun 3, 2025

/cc @michaelasp

@k8s-ci-robot
Copy link
Contributor

@Jefftree: GitHub didn't allow me to request PR reviews from the following users: michaelasp.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @michaelasp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jpbetz jpbetz mentioned this pull request Jun 5, 2025
4 tasks
Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments then LGTM for alpha.

@jpbetz
Copy link
Contributor

jpbetz commented Jun 5, 2025

@deads2k Any concerns with an off-by-default alpha for this? This gives us a good place to start listing out risks..

@deads2k
Copy link
Contributor

deads2k commented Jun 6, 2025

@deads2k Any concerns with an off-by-default alpha for this? This gives us a good place to start listing out risks..

So high level feedback (I haven't done a detailed read yet).

  1. Refactoring to allow graceful lease release followed by os.Exit as normal is a standard refactor that doesn't require an approved KEP, so we can do independent of this being merged.
  2. Changes to avoid the os.Exit are notably riskier, do require a KEP, do require featuregating, and do require testing that demonstrates that we can catch and stop all go func, all reflectors, and clear all caches.

I will circle back for a deeper review hopefully today, maybe Monday, but I'm hoping we're on same page about the two different things. I suspect that my 1 is prereq to 2.

@jpbetz
Copy link
Contributor

jpbetz commented Jun 6, 2025

I'm hoping we're on same page about the two different things. I suspect that my 1 is prereq to 2.

This is how I've been thinking about it too. Stating this directly in the KEP would make sense to me. Even with 1 not requiring a KEP, stating it as a required prelim of the work that does require a KEP sounds like a good way to approach this.

Copy link
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an old iteration of the doc has mingled with new, but the user stories indicate a desire to dynamically select which controllers are running in a process. The design and graduation, and PRR don't reflect this. Please reconcile the design one way or the other. My personal suggestion: don't dynamically change which controllers and running and simply get graceful release and potentially (if you really like it) avoiding process restart.

TBH, once we flush the caches, I'm not entirely clear on what benefit we get from re-using the process.

@deads2k
Copy link
Contributor

deads2k commented Jun 17, 2025

Big picture, once we flush the caches on graceful lease release, what is the benefit of re-using the process instead of exiting?

@deads2k
Copy link
Contributor

deads2k commented Jun 18, 2025

sig and PRR lgtm

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, Jefftree

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2025
@k8s-ci-robot k8s-ci-robot merged commit 32ef0c0 into kubernetes:master Jun 18, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants