-
Notifications
You must be signed in to change notification settings - Fork 16.4k
helm: ensure graceful redis shutdown #58432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm: ensure graceful redis shutdown #58432
Conversation
3bba933 to
89f8883
Compare
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sound very reasonable! Wondering why it was made like this in the past and not noticed for years. Code is always a mystery...
Does someody else know a reason why it was made like this?
I leave this PR open for a moment but in my view this is avery good bugfix. Thanks for raising!
Note: CI checks failed because of infra problems/GH problems today. Can be rebased before merge, then I assume will turn green.
89f8883 to
6e1f79b
Compare
|
Wouldn't it be better to simply 'promote' redis-server to be the command? Good to know why the Redis pod was always taking forever to terminate! |
|
I did notice redis pod takes a good 15-20 min to terminate, hopefully this fixes. |
I haven't tried it, but I don't know if the password would be interpolated. If it does though, then yes direct would be fine. Feel free to open another PR if you want to try it out @amordoch. |
What
This pull request modifies the Helm chart's Redis configuration to start the
redis-serverbinary viaexec, ensuring it runs as PID 1 inside the container.Why
When a Kubernetes Pod is shutting down, it sends a
SIGTERMsignal to the container's main process (PID 1).The previous configuration used the command:
/bin/sh -c "redis-server". In this setup:/bin/shshell is PID 1.redis-serveris a child process.Standard shells like
shdo not automatically propagate signals (likeSIGTERM) to their child processes. When the shell receivesSIGTERM, it ignores it and waits for its child to exit. Since theredis-servernever receives theSIGTERM, it doesn't initiate a graceful save-and-exit sequence.This leads to the pod hanging until the
terminationGracePeriodSecondstimeout is reached (200 seconds by default), at which point Kubernetes sends aSIGKILL(forceful kill).Impact
BGSAVEor persistence flush upon receivingSIGTERM, reducing the risk of data loss compared to an immediateSIGKILL.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.