Restore fail_fast handling when reschedule exceeds MySQL TIMESTAMP limit#67353
Merged
kaxil merged 1 commit intoMay 22, 2026
Merged
Conversation
PR apache#59686 dropped the _handle_fail_fast_for_dag call in the MySQL-TIMESTAMP-limit branch of the reschedule path based on an incorrect SQLA2 deadlock concern. As a result, DAGs with fail_fast=True silently fail to stop sibling tasks when a reschedule date exceeds 2038-01-19 on MySQL. The actual deadlock that motivated apache#59686 came from a different path (FOR UPDATE expanding to the lazy-joined dag_run row), fixed in apache#67246 by scoping the lock with with_for_update={"of": TI}. With that scope in place, the fail-fast call is safe and matches the file's two existing fail-fast sites. Also drops a second misleading comment in the same function claiming session.get was avoided to "avoid SQLA2 lock contention issues" -- the code itself is fine; the rationale was wrong.
amoghrajesh
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restore
_handle_fail_fast_for_dagin the MySQL-TIMESTAMP-limit branch of thereschedule path. Without this, DAGs with
fail_fast=Truesilently fail to stopsibling tasks when a reschedule date exceeds 2038-01-19 on MySQL.
What was wrong
PR #59686 removed the fail-fast call here with this rationale:
That rationale was incorrect on both counts:
session.get(TI, id)on arow already locked by the same transaction acquires no new lock and reads
freely (Postgres, MySQL 8.0+, SQLite all permit this).
silently drops the contract for the rest of the DAG. With
fail_fast=True,sibling non-teardown tasks should be stopped -- the skip turned that into a no-op.
The deadlock that motivated #59686 came from a different code path (
FOR UPDATEexpanding to the lazy-joined
dag_runrow), fixed in #67246 by scoping thelock with
with_for_update={"of": TI}. With that scope in place, the fail-fastcall is safe and matches the file's two existing fail-fast sites.
Behavior change
failing TI is marked FAILED, siblings keep running (or stay queued).
Silent functional bugfix; MySQL-only code path. The regression test mocks the
dialect gate so it runs on every backend in CI.
Also drops a second misleading comment in the same function claiming `session.get`
was avoided to "avoid SQLA2 lock contention issues" -- the code itself is fine;
the rationale was wrong.