[AIRFLOW-1623] Trigger on_kill method in operators#3204
[AIRFLOW-1623] Trigger on_kill method in operators#3204bolkedebruin wants to merge 1 commit intoapache:masterfrom
Conversation
177b87f to
88581a7
Compare
Codecov Report
@@ Coverage Diff @@
## master #3204 +/- ##
==========================================
+ Coverage 74.23% 74.35% +0.12%
==========================================
Files 193 193
Lines 14441 14415 -26
==========================================
- Hits 10720 10718 -2
+ Misses 3721 3697 -24
Continue to review full report at Codecov.
|
on_kill methods were not triggered, due to processes not being properly terminated. This was due to the fact the runners use a shell which is then replaced by the child pid, which is unknown to Airflow.
88581a7 to
4e4e0c1
Compare
|
Tests pass, error is a bit spurious (and unrelated). https://travis-ci.org/bolkedebruin/airflow/builds/364482267 |
|
taking a look |
|
@bolkedebruin, thanks for the clean up. To address your comment on #2975, I believe I agree with you that it doesn't make sense to have the root process killing itself, and your PR addresses this with: In addition, I think we should delete the signal handler in the |
|
Hi @jgao54 Im not sure if deleting the signal handler from the —raw task is the right thing to do. The signal handler is there to trigger the on_kill method of the operator. If absent that would not be run anymore. bashOperator and the SparkSubmitOperator both make use of this. On your comment of the “root” process. I don’t think you are correct and your implementation worked by coincidence. If you run “bash -c sleep 1000”, you will find ‘sleep’ in your process list but you won’t find ‘bash’ running anymore. However the parent pid of sleep still refers to the pid of bash. So your implementation worked in case of a TaskRunner that used something like bash to execute the command. If you would you “sleep” directly (so without bash) you would get ourselves as the parent of the “sleep” process and thus kill yourself. Using setsid makes sure we Airflow sees all processes to the new session/group regardless. Except when they run setsid themselves, which happens for example in the BashOperator. The bashOperator therefore becomes responsible for cleaning up after itself. |
You are correct, silly of me. Looks like that's the only place operator on_kill method is triggered (I somehow remembered that TaskRunner upon termination will explicitly call operator.on_kill, but that's not the case at all)
I believe when Perhaps there's a bit of misunderstanding on This is just mostly me thinking out loud. The setsid makes sense to me and so does this PR, but i'm still a bit lost with the whole picture. |
I verified this (on a mac) and that is not the case. The parent PID is just not there anymore.
|
on_kill methods were not triggered, due to processes not being properly terminated. This was due to the fact the runners use a shell which is then replaced by the child pid, which is unknown to Airflow. Closes apache#3204 from bolkedebruin/AIRFLOW-1623
Make sure you have checked all steps below.
JIRA
Description
on_kill methods were not triggered, due to processes
not being properly terminated. This was due to the fact
the runners use a shell which is then replaced by the
child pid, which is unknown to Airflow.
Tests
new and updated tests
Commits
My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
Passes
git diff upstream/master -u -- "*.py" | flake8 --diffcc: @jgao54 @Fokko @milanvdm
This is an updated and cleaned up version from @jgao54 version