Skip to content

Fix stale upstream_failed propagation after external task state changes#65924

Draft
shaealh wants to merge 5 commits into
apache:mainfrom
shaealh:shaealh/63697
Draft

Fix stale upstream_failed propagation after external task state changes#65924
shaealh wants to merge 5 commits into
apache:mainfrom
shaealh:shaealh/63697

Conversation

@shaealh
Copy link
Copy Markdown
Contributor

@shaealh shaealh commented Apr 27, 2026

Fixes #63697

When checking premature task instances, the scheduler can use a stale list of finished task instances captured earlier in the scheduling loop. If another process changes task states meanwhile, this can cause the scheduler to propagate upstream_failed based on outdated upstream state.

This changes the premature-TI dependency check to re-query finished task instances before allowing flag_upstream_failed=True dependency evaluation to write terminal states.

Added a regression test that simulates separate scheduler/API SQLAlchemy sessions.

@shaealh
Copy link
Copy Markdown
Contributor Author

shaealh commented Apr 28, 2026

48 checks passed, can a reviewer provide sign off to merge? Thank you!

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 5, 2026
@potiuk potiuk removed the ready for maintainer review Set after triaging when all criteria pass. label May 18, 2026
@potiuk potiuk marked this pull request as draft May 18, 2026 10:47
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 18, 2026

@shaealh — Removing the ready for maintainer review label and converting back to draft. The branch now has merge conflicts with main that surfaced after the label was added.

The label's contract is that the PR is ready for maintainer review — a regression like this means the PR temporarily isn't. Rebase your branch onto the latest main, resolve conflicts, then mark "Ready for review" again to re-enter the queue.

git fetch upstream main && git rebase upstream/main, resolve, git push --force-with-lease. See the working-with-git docs.

No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler area:Triggerer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tasks permanently stuck in upstream_failed after marking a failed task as success via API

2 participants