You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scheduler MySQL: misplaced USE INDEX (ti_state) hint on outer rejoin causes task_instance full scan in _executable_task_instances_to_queued (Airflow 3.2 regression from PR #54103) #66763
On a production Airflow 3.2.1 + MySQL deployment with ~3.2M rows in task_instance, the scheduler stopped queueing tasks. TaskInstances remained in scheduled and did not transition to queued. Worker pods that had already completed their work could not report their final state back through the execution API, because metadata DB lock waits piled up behind the scheduler's critical-section query.
Worker-side symptom (Airflow 3 execution API client):
ReadTimeout: timed out
airflow.sdk.api.client.Client.request
PATCH /execution/task-instances/<...>/state
EOFError: Request socket closed before length
BrokenPipeError: [Errno 32] Broken pipe
Scheduler-side symptom:
sqlalchemy.exc.OperationalError: Query execution was interrupted
SELECT task_instance ... LIMIT 16 FOR UPDATE OF task_instance SKIP LOCKED
Root cause
The slow query is generated by airflow.jobs.scheduler_job_runner.SchedulerJobRunner._executable_task_instances_to_queued.
PR #54103 (shipped in 3.2.0) reshaped this method into a two-step pattern: a ranked_query that picks candidate SCHEDULED task instances, followed by an outer query that rejoins the candidates back to task_instance by the composite identity (dag_id, task_id, run_id, map_index).
Both the inner ranked query and the outer rejoin currently apply the same MySQL index hint:
.with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql")
The hint is appropriate on the inner query — that query filters by TI.state == SCHEDULED and the ti_state(state) index is the right one.
The hint is harmful on the outer query. The outer query is not a state lookup; it is an exact lookup keyed by (dag_id, task_id, run_id, map_index). For that pattern, MySQL should use the task_instance_composite_key(dag_id, task_id, run_id, map_index) index. But the hint restricts the optimizer to ti_state, so the composite key is unreachable and the optimizer falls back to a full scan of task_instance.
So the same scheduler loop reads either ~3.2M rows or 1 row per candidate, depending solely on the outer hint.
Minimal reproduction
Stand up Airflow 3.2.0 or 3.2.1 against MySQL 8.0.
Populate task_instance so it has on the order of millions of total rows but only a small handful in SCHEDULED (this matches a long-lived production deployment).
Run the scheduler under load.
EXPLAIN the query emitted from _executable_task_instances_to_queued; the outer join to task_instance will be type=ALL.
Remove only the second .with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql") (the one immediately before .select_from(ranked_query)). Re-run and EXPLAIN — the outer join becomes eq_ref on task_instance_composite_key.
What you think should happen instead?
The outer query (the rejoin to task_instance keyed by the composite identity) should use task_instance_composite_key, not ti_state. The fix is to remove the second .with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql") only. The inner hint should stay.
After the fix, on the same workload:
outer task_instance: task_instance_composite_key, eq_ref, rows=1
inner scheduled-TI scan: ti_state, ref/range over scheduled states
Operating System
Linux (kernel 6.x), Amazon EKS nodes on Bottlerocket / AL2023.
Deployment
Other 3rd-party Helm chart
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
1.20.0
Kubernetes Version
1.33
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
The original USE INDEX (ti_state) hint was added in MySQL Not Using Correct Index for Scheduler Critical Section Query #25627 (≈2022, Airflow 2.2.5) to fix a separate MySQL optimizer issue where the optimizer ignored ti_state on a single-step query. At the time, the hint placement was correct.
The bug is dialect-scoped (dialect_name="mysql"), so PostgreSQL deployments are unaffected. This may be why the regression has gone unreported by managed-service users (MWAA/Composer typically default to Postgres).
The bug surfaces only when task_instance is large enough that a full scan is materially slower than an eq_ref lookup. Small test environments will not reproduce visible symptoms even though the EXPLAIN is wrong.
We've deployed an internal sitecustomize-based hotpatch that strips only the outer hint at process start, with no other behavior changes.
Under which category would you file this issue?
Airflow Core
Apache Airflow version
3.2.1
What happened and how to reproduce it?
What happened
On a production Airflow 3.2.1 + MySQL deployment with ~3.2M rows in
task_instance, the scheduler stopped queueing tasks.TaskInstances remained inscheduledand did not transition toqueued. Worker pods that had already completed their work could not report their final state back through the execution API, because metadata DB lock waits piled up behind the scheduler's critical-section query.Worker-side symptom (Airflow 3 execution API client):
Scheduler-side symptom:
Root cause
The slow query is generated by
airflow.jobs.scheduler_job_runner.SchedulerJobRunner._executable_task_instances_to_queued.PR #54103 (shipped in 3.2.0) reshaped this method into a two-step pattern: a
ranked_querythat picks candidateSCHEDULEDtask instances, followed by an outer query that rejoins the candidates back totask_instanceby the composite identity(dag_id, task_id, run_id, map_index).Both the inner ranked query and the outer rejoin currently apply the same MySQL index hint:
The hint is appropriate on the inner query — that query filters by
TI.state == SCHEDULEDand theti_state(state)index is the right one.The hint is harmful on the outer query. The outer query is not a state lookup; it is an exact lookup keyed by
(dag_id, task_id, run_id, map_index). For that pattern, MySQL should use thetask_instance_composite_key(dag_id, task_id, run_id, map_index)index. But the hint restricts the optimizer toti_state, so the composite key is unreachable and the optimizer falls back to a full scan oftask_instance.Source check (current
main, fileairflow-core/src/airflow/jobs/scheduler_job_runner.py)The function
_executable_task_instances_to_queuedcontains two occurrences of.with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql"):DR.state == RUNNINGandTI.state == SCHEDULED— appropriate.select(TI).with_hint(...).select_from(ranked_query).join(TI, (TI.dag_id == ranked_query.c.dag_id) & ...)— misplaced.Evidence (production EXPLAIN on Aurora MySQL 8.0)
Metadata table state at the time of incident (2026-05-11 KST):
Relevant existing indexes on
task_instance:EXPLAIN of the unmodified scheduler query (outer hint present):
EXPLAIN after removing only the outer
USE INDEX (ti_state)hint:Inner ranked query in both cases (kept as-is):
So the same scheduler loop reads either ~3.2M rows or 1 row per candidate, depending solely on the outer hint.
Minimal reproduction
task_instanceso it has on the order of millions of total rows but only a small handful inSCHEDULED(this matches a long-lived production deployment).EXPLAINthe query emitted from_executable_task_instances_to_queued; the outer join totask_instancewill betype=ALL..with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql")(the one immediately before.select_from(ranked_query)). Re-run andEXPLAIN— the outer join becomeseq_refontask_instance_composite_key.What you think should happen instead?
The outer query (the rejoin to
task_instancekeyed by the composite identity) should usetask_instance_composite_key, notti_state. The fix is to remove the second.with_hint(TI, "USE INDEX (ti_state)", dialect_name="mysql")only. The inner hint should stay.After the fix, on the same workload:
Operating System
Linux (kernel 6.x), Amazon EKS nodes on Bottlerocket / AL2023.
Deployment
Other 3rd-party Helm chart
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
1.20.0
Kubernetes Version
1.33
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
USE INDEX (ti_state)hint was added in MySQL Not Using Correct Index for Scheduler Critical Section Query #25627 (≈2022, Airflow 2.2.5) to fix a separate MySQL optimizer issue where the optimizer ignoredti_stateon a single-step query. At the time, the hint placement was correct._executable_task_instances_to_queuedinto the ranked + outer-rejoin shape and replicated the hint onto the outer rejoin. This is the regression introduction point.dialect_name="mysql"), so PostgreSQL deployments are unaffected. This may be why the regression has gone unreported by managed-service users (MWAA/Composer typically default to Postgres).task_instanceis large enough that a full scan is materially slower than aneq_reflookup. Small test environments will not reproduce visible symptoms even though the EXPLAIN is wrong.Are you willing to submit PR?
Code of Conduct