Fix scheduler/triggerer deadlock on deferrable task instances#65920
Open
shaealh wants to merge 3 commits into
Open
Fix scheduler/triggerer deadlock on deferrable task instances#65920shaealh wants to merge 3 commits into
shaealh wants to merge 3 commits into
Conversation
2 tasks
Contributor
Author
|
Hi team, can I get an approval to merge? Thanks |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Regarding Issue #65818
related: #65836
This changes the scheduler trigger-timeout path and trigger cleanup path to lock candidate
task_instancerows in deterministic primary-key order before updating them.On MySQL/InnoDB, the previous bulk updates could reach overlapping
task_instancerows through different indexes, allowing the scheduler and triggerer to acquire row/gap locks in different orders. With HA schedulers and deferrable tasks, that can deadlock.The new flow is:
task_instance.idrows ordered by primary key.FOR UPDATE SKIP LOCKED.This keeps the existing predicates and update values intact while making lock acquisition consistent across the scheduler and triggerer writers.
Tests added:
Tests run:
ruff check airflow-core/src/airflow/jobs/scheduler_job_runner.py airflow-core/src/airflow/models/trigger.py airflow-core/tests/unit/jobs/test_scheduler_job.py airflow-core/tests/unit/models/test_trigger.pypython -m compileall -q airflow-core/src/airflow/jobs/scheduler_job_runner.py airflow-core/src/airflow/models/trigger.py airflow-core/tests/unit/jobs/test_scheduler_job.py airflow-core/tests/unit/models/test_trigger.pyAIRFLOW_HOME=/tmp/airflow-65818-test-home PATH=/tmp/airflow-test-bin:$PATH .venv/bin/python -m pytest airflow-core/tests/unit/models/test_trigger.py -k 'clean_unused' --with-db-initAIRFLOW_HOME=/tmp/airflow-65818-test-home PATH=/tmp/airflow-test-bin:$PATH .venv/bin/python -m pytest airflow-core/tests/unit/jobs/test_scheduler_job.py -k 'timeout_triggers' --with-db-init