You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same race condition as #66374, but the post-defer TI lands in queued state rather than scheduled, so the fix in #66431 / backport #67089 (merged for 3.2.2) may not cover it. The scheduler still treats the stale executor success event from the worker's defer-exit as a state mismatch and kills the task externally.
Timeline from one occurrence:
[2026-05-20 13:37:14] worker TI started (try_number=1)
[2026-05-20 13:37:18] worker Pausing task as DEFERRED (defer() called, worker exits cleanly)
[2026-05-20 13:37:45] scheduler Executor LocalExecutor reported that the task
instance <TaskInstance: .... [queued]> finished with
state success, but the task instance's state attribute is queued.
Task marked as up_for_retry
[2026-05-20 13:38:05] worker try_number=2 starts, success
Notice the TI state attribute is queued (and the TI bracket label is also [queued]), not scheduled. The fix added by #66431 in process_executor_events only marks the event as ti_requeued when state == SCHEDULED and next_method is not None. The same logical condition (resume-after-defer with a stale executor success) can also leave the TI in QUEUED under load — that path falls through the new branch and the task is still failed/retried.
We see this race fire on a subset of deferred tasks per DAG run. It is not reproducible deterministically — same code, same workload, sometimes races, sometimes doesn't.
Airflow 3.0.3 fails tasks sporadically with state mismatch #53797 (CLOSED, inconclusive): Earlier report of the [queued]-state mismatch on 3.0.3 with LocalExecutor. Closed because the original reporter couldn't repro on their helm deployment ("edge case for docker macOS"), but the comment thread has follow-up reports on Linux / GKE / 3.0.4. This appears to be the same family of race, still present in 3.2.x, and worth treating as a distinct, ongoing bug rather than a Docker-for-Mac quirk.
A deferrable operator that: Calls self.defer(trigger=..., method_name="execute_complete").
Observe the audit/event log. A subset of TIs may hit the [queued] mismatch instead of scheduled.
What you think should happen instead?
The ti_requeued branch in process_executor_events should also handle the queued-state variant, e.g.:
if (
state == TaskInstanceState.SUCCESS
and ti.next_method is not None
and ti.state in (TaskInstanceState.SCHEDULED, TaskInstanceState.QUEUED)
):
# stale defer-exit success, treat as requeue
...
Operating System
Airflow runs in the official apache/airflow:3.2.1 Docker image on Linux. Host: Ubuntu / Linux.
Under which category would you file this issue?
Airflow Core
Apache Airflow version
3.2.1
What happened and how to reproduce it?
Same race condition as #66374, but the post-defer TI lands in queued state rather than scheduled, so the fix in #66431 / backport #67089 (merged for 3.2.2) may not cover it. The scheduler still treats the stale executor success event from the worker's defer-exit as a state mismatch and kills the task externally.
Timeline from one occurrence:
[2026-05-20 13:37:14] worker TI started (try_number=1)
[2026-05-20 13:37:18] worker Pausing task as DEFERRED (defer() called, worker exits cleanly)
[2026-05-20 13:37:45] scheduler Executor LocalExecutor reported that the task
instance <TaskInstance: .... [queued]> finished with
state success, but the task instance's state attribute is queued.
Task marked as up_for_retry
[2026-05-20 13:38:05] worker try_number=2 starts, success
Notice the TI state attribute is queued (and the TI bracket label is also [queued]), not scheduled. The fix added by #66431 in process_executor_events only marks the event as ti_requeued when state == SCHEDULED and next_method is not None. The same logical condition (resume-after-defer with a stale executor success) can also leave the TI in QUEUED under load — that path falls through the new branch and the task is still failed/retried.
We see this race fire on a subset of deferred tasks per DAG run. It is not reproducible deterministically — same code, same workload, sometimes races, sometimes doesn't.
Differences and linkages with current tickets
Reproducer (probabilistic)
What you think should happen instead?
The ti_requeued branch in process_executor_events should also handle the queued-state variant, e.g.:
if (
state == TaskInstanceState.SUCCESS
and ti.next_method is not None
and ti.state in (TaskInstanceState.SCHEDULED, TaskInstanceState.QUEUED)
):
# stale defer-exit success, treat as requeue
...
Operating System
Airflow runs in the official apache/airflow:3.2.1 Docker image on Linux. Host: Ubuntu / Linux.
Deployment
Docker-Compose
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct