-
Notifications
You must be signed in to change notification settings - Fork 486
fix: use notifyContext to manage the operator exit #2463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: use notifyContext to manage the operator exit #2463
Conversation
fix: use notifyContext to manage the operator exit
Can you please add the test steps, so this can be tested? |
Unless we support infrastructure disconnect testing, I am adding this code compilation test // StartOperator starts the MinIO Operator controller
func StartOperator(kubeconfig string) {
_ = v2.AddToScheme(scheme.Scheme)
_ = stsv1beta1.AddToScheme(scheme.Scheme)
_ = stsv1alpha1.AddToScheme(scheme.Scheme)
klog.Info("Starting MinIO Operator")
// set up signals, so we handle the first shutdown signal gracefully
ctx, cancel := setupSignalHandler(context.Background())
defer cancel()
done := ctx.Done()
+ go func() {
+ time.Sleep(time.Second * 5)
+ cancel()
+ }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m approving this PR because it provides a pragmatic improvement to how the operator handles leadership loss, which currently results in a stuck and resource-hungry state. While it doesn’t solve the root cause (lack of probes or retry logic), it introduces a clean and safe exit mechanism using notifyContext
, allowing Kubernetes to restart the pod. This is a net gain in resilience, especially given that the current behavior requires manual intervention. The code is minimal, targeted, and does not introduce complexity or regression risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor change for clarity
Co-authored-by: Allan Roger Reid <allanrogerreid@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did test this PR and the pod now properly terminates when it looses the lease. I do think we need to refactor this code, because it uses two mechanisms:
context.Context
that is initialized when starting the operator.chan struct{}
that will be triggered when the context is cancelled.
I think it's better to have just a single method, because Controller.Start
now receives both the context and the channel which serve exactly the same purpose. I think we need to fix that.
fix: use notifyContext to manage the operator exit
Description
when operator lost leader role. Looks like we can't not exit operator
fix #2458
Related Issue
Type of Change
Screenshots (if applicable e.g before/after)
Checklist
Test Steps
Additional Notes / Context