Skip to content

fix(api): Improve historical_metrics_data endpoint performance (#62021)#62152

Open
aksdevs wants to merge 1 commit intoapache:mainfrom
aksdevs:fix/62021-improve-historical-metrics-endpoint
Open

fix(api): Improve historical_metrics_data endpoint performance (#62021)#62152
aksdevs wants to merge 1 commit intoapache:mainfrom
aksdevs:fix/62021-improve-historical-metrics-endpoint

Conversation

@aksdevs
Copy link

@aksdevs aksdevs commented Feb 19, 2026

Two critical optimizations to reduce database queries and enable query optimizer:

  1. Combine DagRun queries from 2 to 1

    • Before: 2 separate queries (one for run_type, one for state) with identical WHERE clauses
    • After: 1 query grouped by (run_type, state), pivoted in Python to separate counts
    • Impact: Eliminates 1 DB round trip from 4 to 3 total queries
  2. Add explicit TaskInstance.dag_id filter before JOIN

    • Before: TI query only filtered dag_id indirectly via JOIN with DagRun
    • After: Added explicit TaskInstance.dag_id.in_(permitted_dag_ids) filter before JOIN
    • Impact: Enables database optimizer to use ti_dag_run index (dag_id, run_id) before the join
    • Critical: With millions of TaskInstances, this dramatically reduces query execution time

These optimizations specifically address the slow retrieval of historical metrics on large installations where TaskInstance counts are in the millions.

Updated test assertion from 4 to 3 queries to reflect the optimization.

…e#62021)

Two critical optimizations to reduce database queries and enable query optimizer:

1. Combine DagRun queries from 2 to 1
   - Before: 2 separate queries (one for run_type, one for state) with identical WHERE clauses
   - After: 1 query grouped by (run_type, state), pivoted in Python to separate counts
   - Impact: Eliminates 1 DB round trip from 4 to 3 total queries

2. Add explicit TaskInstance.dag_id filter before JOIN
   - Before: TI query only filtered dag_id indirectly via JOIN with DagRun
   - After: Added explicit TaskInstance.dag_id.in_(permitted_dag_ids) filter before JOIN
   - Impact: Enables database optimizer to use ti_dag_run index (dag_id, run_id) before the join
   - Critical: With millions of TaskInstances, this dramatically reduces query execution time

These optimizations specifically address the slow retrieval of historical metrics on large installations where TaskInstance counts are in the millions.

Updated test assertion from 4 to 3 queries to reflect the optimization.
Copy link
Contributor

@viiccwen viiccwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left comments!
and could u provide benchmark results to demonstrate your optimization?
Edit: pls also ref original issue to let others not doing duplicated works. : )

Comment on lines +62 to +65
dag_run_date_filter = (
func.coalesce(DagRun.start_date, current_time) >= start_date,
func.coalesce(DagRun.end_date, current_time) <= func.coalesce(end_date, current_time),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it's a bottleneck. It makes the query non-SARGable and may cause a full table scan.
maybe we should replace coalesce with boolean logic (_or and IS NULL checks)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments