Skip to content

[v3-2-test] Preload source_aliases in process_executor_events (#65422)#66191

Merged
potiuk merged 1 commit intov3-2-testfrom
backport-ef0ff5c-v3-2-test
May 1, 2026
Merged

[v3-2-test] Preload source_aliases in process_executor_events (#65422)#66191
potiuk merged 1 commit intov3-2-testfrom
backport-ef0ff5c-v3-2-test

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

PR #56916 added a _eager_load_dag_run_for_validation helper that
returns two loader options so DRDataModel.model_validate can walk
DagRun.consumed_asset_events[*].asset and source_aliases without
triggering lazy loads on a detached ORM instance. The helper's
docstring example applies both loaders, and _adopt_or_reset_orphaned_tasks
applies both. SchedulerJobRunner.process_executor_events only applies
asset_loader and discards alias_loader, leaving
AssetEvent.source_aliases lazy-loaded.

When Celery/Kubernetes executor reports a failed-while-queued event
for an asset-triggered DagRun whose consumed AssetEvent has a
non-empty source_aliases collection, SQLAlchemy detaches the
AssetEvent by the time pydantic walks the DagRun, and source_aliases
raises DetachedInstanceError wrapped as a pydantic ValidationError.
The exception escapes _run_scheduler_loop, the scheduler exits,
which orphans in-flight task instances. The next scheduler's
_adopt_or_reset_orphaned_tasks picks them up and replays the same
validation path, crashing again.

Apply alias_loader alongside asset_loader at this call site.

Extend test_process_executor_events_with_asset_events so the
AssetEvent now carries a non-empty source_aliases collection and
the assertion verifies the alias survives into the callback payload.
(cherry picked from commit ef0ff5c)

Co-authored-by: Pablo Seibelt pabloseibelt@sicarul.com
Co-authored-by: Claude noreply@anthropic.com

PR #56916 added a _eager_load_dag_run_for_validation helper that
returns two loader options so DRDataModel.model_validate can walk
DagRun.consumed_asset_events[*].asset and source_aliases without
triggering lazy loads on a detached ORM instance. The helper's
docstring example applies both loaders, and _adopt_or_reset_orphaned_tasks
applies both. SchedulerJobRunner.process_executor_events only applies
asset_loader and discards alias_loader, leaving
AssetEvent.source_aliases lazy-loaded.

When Celery/Kubernetes executor reports a failed-while-queued event
for an asset-triggered DagRun whose consumed AssetEvent has a
non-empty source_aliases collection, SQLAlchemy detaches the
AssetEvent by the time pydantic walks the DagRun, and source_aliases
raises DetachedInstanceError wrapped as a pydantic ValidationError.
The exception escapes _run_scheduler_loop, the scheduler exits,
which orphans in-flight task instances. The next scheduler's
_adopt_or_reset_orphaned_tasks picks them up and replays the same
validation path, crashing again.

Apply alias_loader alongside asset_loader at this call site.

Extend test_process_executor_events_with_asset_events so the
AssetEvent now carries a non-empty source_aliases collection and
the assertion verifies the alias survives into the callback payload.
(cherry picked from commit ef0ff5c)

Co-authored-by: Pablo Seibelt <pabloseibelt@sicarul.com>
Co-authored-by: Claude <noreply@anthropic.com>
@boring-cyborg boring-cyborg Bot added the area:Scheduler including HA (high availability) scheduler label Apr 30, 2026
@Lee-W Lee-W marked this pull request as ready for review May 1, 2026 07:41
@Lee-W Lee-W requested review from XD-DENG and ashb as code owners May 1, 2026 07:41
@potiuk potiuk merged commit 817ffa9 into v3-2-test May 1, 2026
72 checks passed
@potiuk potiuk deleted the backport-ef0ff5c-v3-2-test branch May 1, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants