Scheduler queued-timeout can reschedule a deferred TimeDeltaSensorAsync and cause premature success

### Apache Airflow version

2.11.2

### What happened?

We observed a `TimeDeltaSensorAsync` task succeed long before its target time after a scheduler `stuck in queued reschedule` event fired just after the task had already started and deferred.

This was on `KubernetesExecutor` in a production environment, but the interesting part appears to be in the scheduler / deferral interaction rather than the surrounding platform.

The task was:

- DAG: `generate_eval_imp_tfexample_install_to_revenue`
- Task: `d28_eval_imp_generation.wait_for_d28_window`
- Run ID: `scheduled__2026-03-18T00:00:00+00:00`
- Operator: `TimeDeltaSensorAsync`

The trigger log showed the sensor still had about 29 days remaining:

```text
[2026-03-19T00:08:06.515+0000] {temporal.py:74} INFO - 2526713 seconds remaining; sleeping 3600 seconds
```

But the task still resumed and succeeded about 100 seconds later.

### What you think should happen instead?

Once a task has transitioned from `queued` to `running` and then `deferred`, the scheduler's queued-timeout watchdog should no longer be able to reschedule that task instance based on stale queued state.

A deferred `TimeDeltaSensorAsync` should also not be able to resume into `execute_complete()` and succeed unless the trigger has legitimately fired because the target datetime was reached.

### How to reproduce

I do not yet have a minimal standalone reproducer, but the sequence below looks sufficient to explain the bug:

1. A task instance remains in `queued` longer than `[scheduler] task_queued_timeout`.
2. The task is finally picked up by a worker and starts running.
3. It defers successfully via `TimeDeltaSensorAsync` / `DateTimeTrigger`.
4. The scheduler's `stuck_in_queued_reschedule` path still processes the TI and reschedules it.
5. The task is launched again, resumes via `execute_complete()`, and succeeds even though the original trigger had not reached the target time.

### Operating System

Linux

### Versions of Apache Airflow Providers

_No response_

### Deployment

Other

### Deployment details

`KubernetesExecutor` on Astro Hosted Runtime 13.6.0

### Anything else?

Here is the timeline reconstructed from the task log, triggerer log, scheduler log, and Airflow metadata DB:

```text
2026-03-19 00:00:46.827 scheduler: sent TI to KubernetesExecutor
2026-03-19 00:00:54.020 scheduler: received executor event with state queued
2026-03-19 00:00:54.054 scheduler: setting external_id=4387116

2026-03-19 00:08:03.860 task log: task starts running on worker 100.64.21.164
2026-03-19 00:08:06.158 task log: Pausing task as DEFERRED
2026-03-19 00:08:06.280 task log: exited with return code 100 (task deferral)
2026-03-19 00:08:06.515 trigger log: trigger starting
2026-03-19 00:08:06.515 trigger log: 2526713 seconds remaining; sleeping 3600 seconds

2026-03-19 00:08:08.031 Airflow event log: stuck in queued reschedule
  "Task was in queued state for longer than 300.0 seconds; task state will be set back to scheduled."

2026-03-19 00:08:09.601 triggerer log: Trigger 796267 completed
2026-03-19 00:08:10.140 scheduler: sending TI to KubernetesExecutor again
2026-03-19 00:08:17.098 scheduler: received executor event with state queued
2026-03-19 00:08:17.162 scheduler: setting external_id=4387227

2026-03-19 00:09:42.773 task log: second execution starts on worker 100.64.22.189
2026-03-19 00:09:42.926 task log: Resuming after deferral
2026-03-19 00:09:48.163 task log: task marks SUCCESS
```

The key inconsistency is that the `stuck in queued reschedule` event happens after the task has already run and deferred in the same try.

The trigger log also never shows the normal path where the target time is actually reached. It only shows:

```text
[2026-03-19T00:08:06.515+0000] {temporal.py:70} INFO - trigger starting
[2026-03-19T00:08:06.515+0000] {temporal.py:74} INFO - 2526713 seconds remaining; sleeping 3600 seconds
```

There is no later "time condition reached" message before the task resumes.

I also checked the live 2.11.2 code in the running scheduler image:

- `airflow/jobs/scheduler_job_runner.py` `_handle_tasks_stuck_in_queued()` selects TIs by `state == QUEUED`, `queued_dttm < now - task_queued_timeout`, and `queued_by_job_id == self.job.id`, then calls `_maybe_requeue_stuck_ti()`.
- `_reschedule_stuck_task()` updates the TI back to `SCHEDULED` and clears `queued_dttm`.
- `airflow/models/taskinstance.py` logs `Resuming after deferral` when `next_kwargs is not None`.
- `airflow/sensors/time_delta.py` has `TimeDeltaSensorAsync.execute_complete()` simply return `None`.

So the behavior looks consistent with a race where a TI chosen as "stuck queued" is rescheduled after it has already moved through `running -> deferred`, and the resumed execution path then succeeds without re-checking the time condition.

I also ruled out obvious infrastructure causes during the incident window:

- no scheduler pod restarts
- no triggerer pod restarts
- no triggerer tracebacks / OOMs / process death
- no relevant Kubernetes warning events

A possibly related issue is #51301, but that report is about queued-timeout task failures and alerting. This issue seems to be about queued-timeout interacting incorrectly with deferred tasks and leading to premature success.

### Are you willing to submit PR?

- [ ] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler queued-timeout can reschedule a deferred TimeDeltaSensorAsync and cause premature success #65248

Apache Airflow version

What happened?

What you think should happen instead?

How to reproduce

Operating System

Versions of Apache Airflow Providers

Deployment

Deployment details

Anything else?

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scheduler queued-timeout can reschedule a deferred TimeDeltaSensorAsync and cause premature success #65248

Description

Apache Airflow version

What happened?

What you think should happen instead?

How to reproduce

Operating System

Versions of Apache Airflow Providers

Deployment

Deployment details

Anything else?

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions