Skip to content

Conversation

@arkadiuszbach
Copy link
Contributor

@arkadiuszbach arkadiuszbach commented Nov 18, 2025

What

This pull request modifies the Helm chart's Redis configuration to start the redis-server binary via exec, ensuring it runs as PID 1 inside the container.

Why

When a Kubernetes Pod is shutting down, it sends a SIGTERM signal to the container's main process (PID 1).

The previous configuration used the command: /bin/sh -c "redis-server". In this setup:

  1. The /bin/sh shell is PID 1.
  2. The actual redis-server is a child process.

Standard shells like sh do not automatically propagate signals (like SIGTERM) to their child processes. When the shell receives SIGTERM, it ignores it and waits for its child to exit. Since the redis-server never receives the SIGTERM, it doesn't initiate a graceful save-and-exit sequence.

This leads to the pod hanging until the terminationGracePeriodSeconds timeout is reached (200 seconds by default), at which point Kubernetes sends a SIGKILL (forceful kill).

Impact

  • Fixes Pod Hangs: Eliminates unnecessary delays during Pod termination/scaling.
  • Prevents Data Loss Risk : Allows Redis to perform a proper BGSAVE or persistence flush upon receiving SIGTERM, reducing the risk of data loss compared to an immediate SIGKILL.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Nov 18, 2025
@arkadiuszbach arkadiuszbach force-pushed the fix/helm-redis-sigterm-handling branch 4 times, most recently from 3bba933 to 89f8883 Compare November 18, 2025 15:47
@jscheffl jscheffl added this to the Airflow Helm Chart 1.19.0 milestone Nov 18, 2025
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sound very reasonable! Wondering why it was made like this in the past and not noticed for years. Code is always a mystery...

Does someody else know a reason why it was made like this?

I leave this PR open for a moment but in my view this is avery good bugfix. Thanks for raising!

Note: CI checks failed because of infra problems/GH problems today. Can be rebased before merge, then I assume will turn green.

@jscheffl jscheffl force-pushed the fix/helm-redis-sigterm-handling branch from 89f8883 to 6e1f79b Compare November 18, 2025 21:13
@amordoch
Copy link
Contributor

amordoch commented Nov 21, 2025

Wouldn't it be better to simply 'promote' redis-server to be the command? Good to know why the Redis pod was always taking forever to terminate!

@ronaldorcampos
Copy link
Contributor

I did notice redis pod takes a good 15-20 min to terminate, hopefully this fixes.

@jedcunningham
Copy link
Member

Wouldn't it be better to simply 'promote' redis-server to be the command?

I haven't tried it, but I don't know if the password would be interpolated. If it does though, then yes direct would be fine. Feel free to open another PR if you want to try it out @amordoch.

@jedcunningham jedcunningham merged commit bdf793b into apache:main Nov 22, 2025
184 of 191 checks passed
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants