New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-6704] Copy common TaskInstance attributes from Task #7324
Conversation
8d5959e
to
273e79b
Compare
Codecov Report
@@ Coverage Diff @@
## master #7324 +/- ##
=========================================
- Coverage 86.59% 86.19% -0.4%
=========================================
Files 871 871
Lines 40660 40660
=========================================
- Hits 35209 35047 -162
- Misses 5451 5613 +162
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the other place where it was set?
I think we should keep them unless it's obvious some of them are redundant. From what I can tell, the constructor One place I think it should be kept is in @ashb what do you think? |
273e79b
to
cb355ca
Compare
@ashb I actually noticed some similar issues for other attributes too. For example, if the Please take another look. |
self.test_mode = test_mode | ||
self.refresh_from_task(task, pool_override=pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed here if we call self.refresh_from_db
immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly trying to preserve the existing behavior and also move some duplicated code into refresh_from_task()
. However, you are right that this part is not perfect:
Ideally we should first call refresh_from_db()
and then call refresh_from_task()
. The call to refresh_from_db()
is to load those cumulative values such as self.try_number
and self.max_tries
from db so that individual runs of the task can increment these numbers. The call to refresh_from_task()
is to get those configurable values from the latest DAG definition. However at the moment refresh_from_db()
is loading both cumulative values and configurable attributes. So it also sets configurable values such as self.queue
and self.operator
which are most likely more useful to be read from DAG definition via refresh_task()
.
This PR is not trying to fix everything. It only consolidate some duplicated code and make attributes such as self.queue
and self.pool
update-able when tasks are cleared in clear_task_instances()
. It's probably worth a separate and bigger PR to make sure refresh_from_db()
is only reading those attributes that really should come from db and leave other attributes to refresh_from_task()
.
Thanks @yuqian90 🎉 |
On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type. Code snippet of the task looks something as below. Please assume that DAG
The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out.
After which the |
To add to my above post. It has so been found that when we remove |
Hi @shanit-saha
I think this line is the cause of the problem. I believe what you actually want to express is to trigger However, doing I think you already realized if you leave If you think the doc of |
@yuqian90 : Thank You ! for your response. |
Many things changed between 10.2 and 10.10. I haven't looked too carefully into the error you have. I can't tell if this PR is related. Any reason you think this PR is the cause? |
Certain attributes of
TaskInstance
such asoperator
andqueue
are either not updated or only updated when when task is executed.This causes some issues:
operator
field is left as None. This causes bugs when some other code tries to use theoperator
field to find the name of the class.pool
orqueue
of a task is changed in the DAG definition, there is no way to re-run existingTaskInstance
using the updated values. Clearing theTaskInstance
does not update these attributes.The fix is to copy
TaskInstance
fromTask
in a functionrefresh_from_task()
consistently.Issue link: AIRFLOW-6704
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]
. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.