Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix teardown scheduling if indirect upstream is cleared #38788

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Apr 5, 2024

This PR fixes the calculation of the scheduling decision if upstream to a teardown task an individual task is reset (w/o resetting all downstream!) but the direct upstream tasks of the teardown are completed.

As reported in #29332 in such case as direct upstream dependencies of teardown tasks were completed, teardown was starting too early.

With this fix teardown scheduling evaluates all upstream dependencies if any upstream is not completed.

How to test?

  • Use the DAG in the comments of the bug report Ensure teardown doesn't run until all other tasks complete #29332 and deploy this
  • Run this DAG (at least once) till success
  • Clear the first normal task w/o clearing downstream
  • See that teardown is scheduled only if the cleared task is completed (before this PR it is immediately scheduled after setup was completed in parallel to the actual cleared task

closes: #29332

@jscheffl jscheffl added area:Scheduler including HA (high availability) scheduler type:bug-fix Changelog: Bug Fixes AIP-52 Automatic setup and teardown tasks labels Apr 5, 2024
@jscheffl jscheffl changed the title Fix teardown if indirect upstream is cleared Fix teardown scheduling if indirect upstream is cleared Apr 5, 2024
@eladkal eladkal added this to the Airflow 2.9.1 milestone Apr 6, 2024
@jscheffl jscheffl force-pushed the bugfix/29332-fix-teardown-with-direct-upsteam-success branch from bc7dfae to 31f21f2 Compare April 6, 2024 16:38
@jscheffl jscheffl marked this pull request as draft April 6, 2024 16:42
Copy link
Contributor Author

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I fail miserably fixing the pytests, I need expert knowledge to fix this it seems :-(

2,
_UpstreamTIStates(6, 0, 1, 0, 0, 7, 1, 0),
[SUCCESS, SUCCESS, SUCCESS, SUCCESS, SUCCESS, SUCCESS, FAILED],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pytest fails because it seems the dependency context is returned empty after evaluation. I am not sure why. I assume something is not "correctly mocked".

2,
_UpstreamTIStates(6, 1, 0, 0, 0, 7, 1, 1),
[SUCCESS, SUCCESS, SUCCESS, SUCCESS, SUCCESS, SUCCESS, SKIPPED],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pytest fails because it seems the dependency context is returned empty after evaluation. I am not sure why. I assume something is not "correctly mocked".

True,
(False, "skipped"), # is_teardown=False, expect_state="skipped'
True,
id="not teardown all done one skipped",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test I would have expected a skipped result but it is returning with Noneafter evaluation. I assume some mocks are wrong here :-(

for i in range(upstream_setups):
task = EmptyOperator(task_id=f"setup_{i}", dag=dag).as_setup()
task.set_downstream(downstream)
task.set_downstream(work_tasks)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I re-modelled the test DAG such that the setup tasks are in front of the worker tasks, else the setup is directly linked to downstream teardown which (1) does not make sense and (2) is explicitly not handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand. it might be best to make a separate PR for your test refactor without simultaneously changing the code. then followup with the real code change (and the test changes required just for that will be easier to see)

["one_success", 0, _UpstreamTIStates(0, 0, 5, 0, 0, 5, 0, 0), True, State.UPSTREAM_FAILED, False],
["one_success", 0, _UpstreamTIStates(0, 0, 4, 1, 0, 5, 0, 0), True, State.UPSTREAM_FAILED, False],
["one_success", 0, _UpstreamTIStates(0, 0, 0, 5, 0, 5, 0, 0), True, State.UPSTREAM_FAILED, False],
["one_success", 0, [SUCCESS, SUCCESS, SUCCESS, SUCCESS, SUCCESS], True, None, True],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a super huge fan of the scrutability of the "old" way, but it's hard to see how the new way tests the same thing. like i said below, i think best to separate test change from code+test change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-52 Automatic setup and teardown tasks area:Scheduler including HA (high availability) scheduler type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure teardown doesn't run until all other tasks complete
5 participants