fix(scheduler): add row lock to AssetPartitionDagRun fetch to prevent…#68061
Open
bujjibabukatta wants to merge 2 commits into
Open
fix(scheduler): add row lock to AssetPartitionDagRun fetch to prevent…#68061bujjibabukatta wants to merge 2 commits into
bujjibabukatta wants to merge 2 commits into
Conversation
… duplicate DagRun creation in HA
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When running 2+ schedulers (HA mode), DAGs using
PartitionedAssetTimetableor
CronPartitionTimetableintermittently produce duplicate DagRuns for thesame asset event.
PR #60773 fixed this race for non-partitioned asset scheduling by adding
with_row_locksto theAssetDagRunQueuefetch. However, partitioned assetsflow through a separate code path —
_create_dagruns_for_partitioned_asset_dags— which reads
AssetPartitionDagRunrows with a plainSELECT, no lock.Two schedulers can read the same unprocessed rows within ~40ms of each other
and independently create a DagRun for each.
Confirmed reproducible on MWAA 3.2.1 (which ships #60773) with 2 schedulers
and 2+ consumer DAGs on the same asset.
Fix
Add
with_row_locks(..., skip_locked=True)to theAssetPartitionDagRunquery in
_create_dagruns_for_partitioned_asset_dags, mirroring exactly what#60773 did for
AssetDagRunQueue.SELECT ... FOR UPDATE SKIP LOCKEDensures that when Scheduler A locks theAPDR rows it is processing, Scheduler B's identical query returns zero rows
and exits cleanly — no duplicate DagRun is created.
with_row_locksis already imported in this file so no new imports are needed.Testing
PartitionedAssetTimetableasset (steps from Duplicate DAG runs for PartitionedAssetTimetable with multiple schedulers (HA) — race not covered by #60773 #68045)
Related