Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove custom signal handling in Triggerer #23274

Merged
merged 1 commit into from
Apr 27, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Apr 26, 2022

There is a bug in CPython (fixed in March 2022 but not yet released) that
makes async.io handle SIGTERM improperly by using async unsafe
functions and hanging the triggerer receive SIGPIPE while handling
SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
we should rather rely on standard handling of the signals rather than
adding our own signal handlers. Seems that even if our signal handler
just run exit(0) - it caused a race condition that led to the hanging.

More details:
* https://bugs.python.org/issue39622
* python/cpython#83803

Fixes: #19260


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

There is a bug in CPython (fixed in March 2022 but not yet released) that
makes async.io handle SIGTERM improperly by using async unsafe
functions and hanging the triggerer receive SIGPIPE while handling
SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
we should rather rely on standard handling of the signals rather than
adding our own signal handlers. Seems that even if our signal handler
just run exit(0) - it caused a race condition that led to the hanging.

More details:
   * https://bugs.python.org/issue39622
   * python/cpython#83803

Fixes: apache#19260
@potiuk potiuk force-pushed the remove-sigterm-handling-in-triggerer branch from 54aad2b to 3a73320 Compare April 26, 2022 22:10
@potiuk
Copy link
Member Author

potiuk commented Apr 26, 2022

CC: @andrewgodwin @ashb @dstandish -> I was able to reproduce the Ctrl-C problem and it's gone after I removed the custom signal handling in Triggerer, so it looks like the hypothesis of the async.io bug from https://bugs.python.org/issue39622 python/cpython#83803 seems even more plausible.

Pls. take a look and see if my Hypothesis from #23271 (comment) looks sound and maybe we can just fix it permanently also for production.

I believe our custom signal handling of SIGINT and SIGTERM in Triggerer (which then would simply run sys.exit(0) ) is not really needed (default handling of both signals terminates the process eventually). I left SIGQUIT handling though for diagnostics (And QUIT is rarely used anyway for anything else).

@potiuk potiuk added this to the Airflow 2.3.0 milestone Apr 26, 2022
@potiuk
Copy link
Member Author

potiuk commented Apr 26, 2022

I also found that the standalone problem with hanging was already reported in #19260

@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Apr 27, 2022
@potiuk potiuk merged commit 6bdbed6 into apache:main Apr 27, 2022
@potiuk potiuk deleted the remove-sigterm-handling-in-triggerer branch April 27, 2022 16:18
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label May 8, 2022
ephraimbuddy pushed a commit that referenced this pull request May 8, 2022
There is a bug in CPython (fixed in March 2022 but not yet released) that
makes async.io handle SIGTERM improperly by using async unsafe
functions and hanging the triggerer receive SIGPIPE while handling
SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
we should rather rely on standard handling of the signals rather than
adding our own signal handlers. Seems that even if our signal handler
just run exit(0) - it caused a race condition that led to the hanging.

More details:
   * https://bugs.python.org/issue39622
   * python/cpython#83803

Fixes: #19260
(cherry picked from commit 6bdbed6)
ephraimbuddy pushed a commit that referenced this pull request May 21, 2022
There is a bug in CPython (fixed in March 2022 but not yet released) that
makes async.io handle SIGTERM improperly by using async unsafe
functions and hanging the triggerer receive SIGPIPE while handling
SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
we should rather rely on standard handling of the signals rather than
adding our own signal handlers. Seems that even if our signal handler
just run exit(0) - it caused a race condition that led to the hanging.

More details:
   * https://bugs.python.org/issue39622
   * python/cpython#83803

Fixes: #19260
(cherry picked from commit 6bdbed6)
@tanelk
Copy link
Contributor

tanelk commented Jun 7, 2022

@potiuk It seems that after this we do not change the TriggererJob state to success any more. They stay in running:
2022-06-07_18-17

There is also register_signals method in the triggerer job, but it is never called.

def register_signals(self) -> None:
"""Register signals that stop child processes"""
signal.signal(signal.SIGINT, self._exit_gracefully)
signal.signal(signal.SIGTERM, self._exit_gracefully)

potiuk added a commit to potiuk/airflow that referenced this pull request Jun 11, 2022
@potiuk
Copy link
Member Author

potiuk commented Jun 11, 2022

Thanks @tanelk - reverting it for 2.3.3 then and reopened the original issue.

potiuk added a commit that referenced this pull request Jun 13, 2022
ephraimbuddy pushed a commit that referenced this pull request Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI okay to merge It's ok to merge this PR as it does not require more tests type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Airflow standalone command does not exit gracefully
4 participants