Forward termination signals from supervisor to task subprocess by andreahlert · Pull Request #61627 · apache/airflow

andreahlert · 2026-02-08T09:51:03Z

Summary

When a Kubernetes worker pod receives SIGTERM (e.g. spot interruption, scaling down, rolling update), the signal is delivered to the supervisor process (PID 1 in the container). Previously, the supervisor had no signal handler installed and would exit with default behavior, leaving the task subprocess orphaned without ever calling the operator's on_kill() hook. This meant spawned resources (pods, subprocesses, etc.) were never cleaned up.

Root cause: The supervise() function starts the task subprocess and calls process.wait(), but never installs signal handlers for SIGTERM/SIGINT. The task subprocess does have a SIGTERM handler (registered in task_runner.py) that calls on_kill(), but the signal never reaches it because the supervisor process terminates first.

Fix: Install SIGTERM/SIGINT signal handlers in supervise() that forward the received signal to the task subprocess via os.kill(). The child's existing handler then calls on_kill() as expected, restoring the Airflow 2 behavior.

Signal flow after fix:

K8s sends SIGTERM to supervisor (PID 1)
Supervisor's new handler forwards SIGTERM to task subprocess
Task subprocess's existing _on_term handler calls operator.on_kill()
Operator cleans up resources (pods, subprocesses, etc.)
Subprocess exits, supervisor's wait() returns normally

Changes

task-sdk/src/airflow/sdk/execution_time/supervisor.py: Added signal forwarding in supervise() function. Signal handlers are saved, installed before process.wait(), and restored in a finally block.
task-sdk/tests/task_sdk/execution_time/test_supervisor.py: Added test that verifies SIGTERM forwarding from supervisor to subprocess triggers the operator's on_kill() hook.

Test plan

New test test_on_kill_hook_called_when_supervisor_receives_sigterm verifies the signal forwarding chain
Existing test_on_kill_hook_called_when_sigkilled still passes (no regression)
Existing signal-related tests (test_kill_escalation_path, test_exit_by_signal) still pass

When a Kubernetes worker pod receives SIGTERM (e.g. spot interruption, scaling down), the signal is delivered to the supervisor process (PID 1 in the container). Previously, the supervisor had no signal handler and would exit with default behavior, leaving the task subprocess orphaned without ever calling the operator's on_kill() hook. This meant spawned resources (pods, subprocesses, etc.) were never cleaned up. This change installs SIGTERM/SIGINT signal handlers in the supervise() function that forward the received signal to the task subprocess. The child process already has a signal handler (registered in task_runner.py) that calls on_kill() when it receives SIGTERM, so forwarding the signal completes the chain and restores the Airflow 2 behavior. Fixes: apache#58936

Remove SystemExit from test's on_kill() to match realistic operator behavior, and add SIGKILL safety net in the background thread to prevent the test from hanging if signal forwarding fails.

SameerMesiah97

Looks fine but needs slightly more polish. It seems like CI might have flaked but that should hopefully not happen when you force push again.

SameerMesiah97 · 2026-02-08T19:24:35Z

task-sdk/src/airflow/sdk/execution_time/supervisor.py

+            try:
+                os.kill(process.pid, signum)
+            except ProcessLookupError:
+                pass


Are you swallowing the exception here because you anticipate that the child worker has been killed before invoking os.kill? If that is the case, I would add a comment here explaining that. Like this:

# Child process may have already exited during shutdown races.

This is more a nit but silent exception swallowing tends to raise eyebrows for readers who might have partial context.

SameerMesiah97 · 2026-02-08T19:44:28Z

task-sdk/tests/task_sdk/execution_time/test_supervisor.py

+            except ProcessLookupError:
+                pass
+
+        signal.signal(signal.SIGTERM, _forward_signal)


Your new implementation handles both SIGTERM and SIGINT, but you appear to be testing only SIGTERM here. Is this because you only anticipate K8s to send SIGTERM? I would suggest explaining that here in a comment so that a casual reader does not assume this is a gap. Not a blocking suggestion.

SameerMesiah97 · 2026-02-08T19:52:58Z

task-sdk/tests/task_sdk/execution_time/test_supervisor.py

+                def execute(self, context):
+                    for i in range(1000):
+                        print(f"Iteration {i}")
+                        sleep(1)


I get why the loop needs to run “long enough” so the subprocess is alive when the signal is delivered, but 1000 iterations at 1s each feels a bit overkill. If for some reason the subprocess doesn’t get terminated as expected, this could run for ~15 minutes and materially stall CI. Is there any reason this can’t be reduced to a smaller number (e.g. 30–60 iterations) while still leaving plenty of headroom for signal delivery?

andreahlert requested review from amoghrajesh, ashb and kaxil as code owners February 8, 2026 09:51

boring-cyborg bot added the area:task-sdk label Feb 8, 2026

Improve signal forwarding test robustness

90d1b39

Remove SystemExit from test's on_kill() to match realistic operator behavior, and add SIGKILL safety net in the background thread to prevent the test from hanging if signal forwarding fails.

SameerMesiah97 reviewed Feb 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward termination signals from supervisor to task subprocess#61627

Forward termination signals from supervisor to task subprocess#61627
andreahlert wants to merge 2 commits intoapache:mainfrom
andreahlert:fix/signal-propagation-on-kill

andreahlert commented Feb 8, 2026 •

edited

Loading

Uh oh!

SameerMesiah97 left a comment •

edited

Loading

Uh oh!

SameerMesiah97 Feb 8, 2026

Uh oh!

SameerMesiah97 Feb 8, 2026

Uh oh!

SameerMesiah97 Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andreahlert commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

SameerMesiah97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SameerMesiah97 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

SameerMesiah97 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

SameerMesiah97 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andreahlert commented Feb 8, 2026 •

edited

Loading

SameerMesiah97 left a comment •

edited

Loading