Fix occassional external task sensor test failures#18853
Conversation
Why? The dependency-based ordering (topological) does not guarantee to match alphabetical ordering. I think we need to actually parse the dependencies to determine ordering...? |
Because this is how test dags are constructed. If you process them in the deterministic order, it works. If the order is random, it sometimes does not. That's as simple as that. there is no "dependency check" here. Unlike the previous attemtpt which sometimes worked, sometimes did not - this will work for sure. |
The problem (here and few other tests) is that there was an implicit assumption that all tasks will have state set. But this is not how it works when you process the tasks in test in random order. If you first process the tasks that are depending on other (non-processed) tasks, they will keeep their state to None - which causes test failure. Fortunately the structure of the test tasks is that if you process them in the alphabetical order, the test will always succeed - because task1 is before task2 (and in our tests task2 depends on task1). |
|
BTW. I woudl be happy to process tasks in all the tests in topological order and maybe we can add a tool that all tests will do to do so, but IMHO it's quite complex piece of code, and I am on a quest to fix the flaky tests with least amount of effort. Unless of course there is other better way to do it without heavy investments (and I am all ears for that). |
Occassionally the sensor tests fail with assertion where
state seems to be None. This might be caused by
```
def assert_ti_state_equal(task_instance, state):
"""
Assert state of task_instances equals the given state.
"""
task_instance.refresh_from_db()
> assert task_instance.state == state
E AssertionError: assert None == <TaskInstanceState.SUCCESS: 'success'>
E + where None = <TaskI$anstance: dag_1.task_b_1 manual__2015-01-01T00:00:00+00:00 [None]>.state
```
Turned out it was because the task instance fields from
dagrun.taskinstance relationship could be returned in different
order so some of the dependencies were not met for some of the
tasks when later task was returned before earlier one.
Deterministic sorting according to task_id solved the problem.
6547589 to
f1d3b33
Compare
|
@uranusjr I've added a comment that explains why we've used the regular sort rather than topological sort (with our test DAG structure those two are equivalent)- same here "simple" is better than "right" especially that we are in full control of the "input" (DAG structure) and we do not have to implement a general solution. |
|
Same here - I saw this test failing in at least 3-4 PRs recently. Merging it would be helpful to avoid false negatives. |
Occassionally the sensor tests fail with assertion where
state seems to be None. This might be caused by
```
def assert_ti_state_equal(task_instance, state):
"""
Assert state of task_instances equals the given state.
"""
task_instance.refresh_from_db()
> assert task_instance.state == state
E AssertionError: assert None == <TaskInstanceState.SUCCESS: 'success'>
E + where None = <TaskI$anstance: dag_1.task_b_1 manual__2015-01-01T00:00:00+00:00 [None]>.state
```
Turned out it was because the task instance fields from
dagrun.taskinstance relationship could be returned in different
order so some of the dependencies were not met for some of the
tasks when later task was returned before earlier one.
Deterministic sorting according to task_id solved the problem.
(cherry picked from commit 7a28ee3)
Occassionally the sensor tests fail with assertion where
state seems to be None. This might be caused by
```
def assert_ti_state_equal(task_instance, state):
"""
Assert state of task_instances equals the given state.
"""
task_instance.refresh_from_db()
> assert task_instance.state == state
E AssertionError: assert None == <TaskInstanceState.SUCCESS: 'success'>
E + where None = <TaskI$anstance: dag_1.task_b_1 manual__2015-01-01T00:00:00+00:00 [None]>.state
```
Turned out it was because the task instance fields from
dagrun.taskinstance relationship could be returned in different
order so some of the dependencies were not met for some of the
tasks when later task was returned before earlier one.
Deterministic sorting according to task_id solved the problem.
(cherry picked from commit 7a28ee3)
Occassionally the sensor tests fail with assertion where
state seems to be None. This might be caused by
```
def assert_ti_state_equal(task_instance, state):
"""
Assert state of task_instances equals the given state.
"""
task_instance.refresh_from_db()
> assert task_instance.state == state
E AssertionError: assert None == <TaskInstanceState.SUCCESS: 'success'>
E + where None = <TaskI$anstance: dag_1.task_b_1 manual__2015-01-01T00:00:00+00:00 [None]>.state
```
Turned out it was because the task instance fields from
dagrun.taskinstance relationship could be returned in different
order so some of the dependencies were not met for some of the
tasks when later task was returned before earlier one.
Deterministic sorting according to task_id solved the problem.
(cherry picked from commit 7a28ee3)
Occassionally the sensor tests fail with assertion where
state seems to be None. This might be caused by
Turned out it was because the task instance fields from
dagrun.taskinstance relationship could be returned in different
order so some of the dependencies were not met for some of the
tasks when later task was returned before earlier one.
Deterministic sorting according to task_id solved the problem.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.