Skip to content

Fix schedule_tis HA race on try_number/state transitions#63367

Open
sidshas03 wants to merge 1 commit intoapache:mainfrom
sidshas03:fix-57618-60330-scheduler-race
Open

Fix schedule_tis HA race on try_number/state transitions#63367
sidshas03 wants to merge 1 commit intoapache:mainfrom
sidshas03:fix-57618-60330-scheduler-race

Conversation

@sidshas03
Copy link
Contributor

@sidshas03 sidshas03 commented Mar 11, 2026

Related to #57618 and #60330.

This updates DagRun.schedule_tis() to make scheduling more robust under HA scheduler races.

  • Guard TI updates by both:
    • TI.id IN (...)
    • TI still being in a schedulable state (NULL or SCHEDULEABLE_STATES)
  • Reuse one next_try_number expression so behavior is consistent:
    • UP_FOR_RESCHEDULE keeps the same try_number
    • other schedulable states increment try_number
  • Apply the same state guard to EmptyOperator short-circuit updates (SUCCESS) to avoid stale scheduler views overwriting newer state.
  • Keep existing debug mismatch logging, but skip mismatch warnings when row state is no longer SCHEDULED (expected in race/no-op cases).

In HA setups, two schedulers can race on the same TI. A stale scheduler view should not be able to:

  • increment try_number again
  • transition state after another scheduler already moved it to QUEUED / RUNNING / etc.

This change makes those stale updates no-ops.

Tests added

In airflow-core/tests/unit/models/test_dagrun.py:

  1. test_schedule_tis_does_not_increment_try_number_if_ti_already_queued_by_other_scheduler
  2. test_schedule_tis_empty_operator_does_not_short_circuit_if_ti_already_queued
  3. test_schedule_tis_up_for_reschedule_does_not_increment_try_number
  4. test_schedule_tis_is_noop_if_ti_transitions_to_nonschedulable_state_before_update
  5. test_schedule_tis_empty_operator_is_noop_if_ti_already_running
  6. test_schedule_tis_only_one_scheduler_update_succeeds_when_competing

Local validation

  • Targeted new tests: passed
  • Broader -k schedule_tis subset in test_dagrun.py: passed (existing expected xfails unchanged)
  • Ruff on modified files: passed

@sidshas03 sidshas03 requested review from XD-DENG and ashb as code owners March 11, 2026 17:52
@sidshas03 sidshas03 force-pushed the fix-57618-60330-scheduler-race branch 3 times, most recently from 4233546 to add6ff0 Compare March 11, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant