-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [AIRFLOW-1623] Clearing task in UI does not trigger on_kill method in operator #2877
Conversation
Can we add tests for this new functionality? |
@gwax As said in the description and the JIRA issue, this line of code does not fix this issue. I made this PR in the hope someone with knowledge of the core code of Airflow could help me out. |
After some investigation, I think I understand where the problem is coming from. In the current setup of operators, it is not possible to save a state inside of the operators such as pid or in our case, the id of a spark job. When you clear a job, it will clear the TaskInstances and not the tasks (and therefore has no previous state which you saved in the operator. How I see a possible solution is a mechanism which you can use inside of an operator to specify what kind of information you want to save. And this state can be accessed from a TaskInstance. |
Create a persistent context which gets saved to the database. This context can be used in the Operators to save KV pairs.
I think the proper fix is to install a signal handler inside the right process, and then have the listener process (this is the LocalExecutor I think, the |
@ashb |
Thanks for working on this @milanvdm. @ashb I think you may be referring to the LocalTaskJob not the LocalExecutor, which is what One way to fix the issue is to modify the terminate method to something roughly like the following:
I'm not sure why the root process is not signaled in the first place, so perhaps there's some other implications I'm overlooking. That said, I don't have any experience running Airflow with Celery (we run LocalExecutor), but I am also curious about the question on handling state in distributed context. |
run_this_2.set_upstream(run_this_1) | ||
run_this_3 = DummyOperator(task_id='run_this_3', dag=dag) | ||
run_this_3.set_upstream(run_this_2) | ||
run_this_1 = BashOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for a quick local test. Will be cleaned up when the approach has been validated or not :)
With this PR I want to initiate the discussion on https://issues.apache.org/jira/browse/AIRFLOW-1623.
Some context on the expected behavior of Airflow:
The last part of this behavior is currently not happening due to the bug described in the issue. As mentioned in the issue description, the line of code I added with the PR, does not solve the issue as the reference to the task is lost.
Not sure who can help with this issue as this touches a lot of core parts of Airflow and for me they are all not that clear.
One solution I can think of is instead of using the task reference, create a kill cli command using the task_id.
JIRA
Description
Tests
Commits
My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
Passes
git diff upstream/master -u -- "*.py" | flake8 --diff