New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when lameduck and config-reload are enabled, editing the config causes all pods in the deployment to go unready #5471
Comments
Hmm, It kind of sounds like lameduck is triggering on a reload. It shouldn't be doing that. |
Looks like it was introduced by #2812. |
Actually - it looks like it has been that way since lameduck was introduced. |
May have been introduced with #4167. I think there are two ways to "fix" this:
Perhaps both should be fixed, although for the 2nd, reloading doesn't normally take long enough for this to cause an issue (if we don't lameduck during reloads). |
Actually I don't think that's a good fix. And that PR would not have introduced it anyways - the ready listener is closed on reload, so being unready on reload was expected behavior prior to #4167 (#4167 fixes an issue related to persistent connections). The "unready during lameduck" behavior is due to ready coming before health in plugin.cfg. If that were reversed, the lameduck sleep (in health) would occur before ready closes its listener. But swapping the order would also cause coredns to report ready during a shut down, which we do not want. |
I'm submitting this as a question instead of a bug since it might be the intended behavior but it seems undesirable.
Is this the expected behavior? Any advice on how to avoid a total service outage when updating the configmap?
What happened:
Editing the coredns configmap, when lameduck is enabled, causes all pods in the deployment to go unready during config reload.
What you expected to happen:
Config reload should occur without entering lameduck mode if possible, as there's no guarantee that pods will reload config one at a time.
How to reproduce it (as minimally and precisely as possible):
Configure a coredns deployment with lameduck mode and readiness/liveness probes.
Edit the configmap (i.e. change lameduck duration from 21s to 22s) and wait for config reload to trigger.
Observe that pods fail readiness probe until lameduck mode ends, causing total service outage.
Environment:
OS (e.g:
cat /etc/os-release
):Amazon Linux 2 Karoo
Probe configuration:
The text was updated successfully, but these errors were encountered: