-
Notifications
You must be signed in to change notification settings - Fork 527
fix: Graceful coredns shutdown #4443
fix: Graceful coredns shutdown #4443
Conversation
💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. Examples of commit messages with semantic prefixes: - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this fix. This will have significant positive impacts on clusters built with this.
@jackfrancis - It will be great to be able to get a point release with this change (0.63.1 at least). Osama did a great job finding the root cause of this problem and identify such a simple fix. |
health | ||
health { | ||
# this should be > readiness probe failure time | ||
lameduck 35s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea here is to fallback to the coredns runtime health check in case readinessProbe fails to restart things after 3 failures separated by 10 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make sure that coredns will not terminate until it is safe for it to do.
The idea here is when coredns gets a SIGTERM, lameduck
will delay the actual shutdown for 35s. Until then, coredns will continue to service requests. In same time, readiness
plugin will start failing. We wait 10*3+5 seconds to make sure that this instance is and no longer will be getting any queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the clarification. Just to be super clear, 3 failures, separated by 10 seconds, could be as short as just over 20 seconds. It should never be as long as 30 seconds. So if it helps, we could reduce the lameduck config to 30s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is to delay the eviction of the pod for 35 seconds but trigger health failures right away at the start of the eviction. This will then, after the health probe failure (3x10) remove the pod from the service (and thus the load balancing for the service) such that shutting it down will not trigger DNS requests to route to the now dead pod.
It was technically a race between not sending any requests to the pod and the pod no longer running. The old way always ended up with the pod stopping before the requests stopped coming and with this new mechanism this is the inverse now - it always stops sending requests to the pod before the pod stops running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trick is that there is some time after the health fails that the kubernetes infrastructure removes the pod from the service so the extra 5 seconds is to allow this to fully propagate through the whole cluster. Remember, some node may have just started to want to do DNS at the same moment that the 30 seconds runs out and thus we could end up routing to the bad pod before the route change got pushed everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, lgtm, thanks for this improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, Michael-Sinz, technicianted The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #4443 +/- ##
========================================
Coverage 72.04% 72.05%
========================================
Files 141 141
Lines 21631 21764 +133
========================================
+ Hits 15584 15681 +97
- Misses 5096 5131 +35
- Partials 951 952 +1
Continue to review full report at Codecov.
|
Congrats on merging your first pull request! 🎉🎉🎉 |
Reason for Change:
This PR adds graceful shutdown configurations to coredns.
Currently, if a coredns pod is deleted (due to eviction for example), the container shuts down immediately. This has negative side effects:
If any of the two happens, we end up with the dreadful 5s timeout/retry latency.
The PR adds
lameduck
configuration to thehealth
plugin such that it would delay shutting down the service until 5 seconds after the readiness probes had failed. This guarantees that both in-flight queries are completed and enough time for this instance IP address to be discarded from the service.Note: Since original readiness probe parameters were left at defaults, the PR also explicitly sets the default to make it clearer.
Credit Where Due:
technicianted
Does this change contain code from or inspired by another project?
Requirements:
Notes:
The issue can be reproduced by running
dig
in a loop (100ms delay) then deleting/evicting a coredns pod.