Description
Airflow 3.0 moved the public API and UI-backing endpoints from Flask to FastAPI (airflow-core/src/airflow/api_fastapi/). Operators monitoring production deployments need per-route request count, latency, and status distributions to diagnose slow endpoints (Grid TI-summary fan-out, log download, DAG list paging). Today the API server emits process-level metrics but no per-route HTTP timing.
The existing OTel metrics infrastructure (airflow/metrics/otel_logger.py) already supports histograms. Adding a small FastAPI middleware that records request duration with route, method, status attributes would close the gap without bringing in a third-party dependency like prometheus-fastapi-instrumentator.
Use case / motivation
- Operators need to alert on p99 latency for
/api/v2/dags/{dag_id}/dagRuns before users notice — today the only signal is gunicorn worker timeouts.
- UI-perf regression detection on the Grid view depends on knowing which back-end endpoints regress, not just the front-end page-load time.
- Capacity planning for the API server (worker count, replica count) needs request-rate histograms grouped by route.
Proposal
Add an opt-in FastAPI middleware that records:
airflow.api.requests_total (counter, tagged route/method/status)
airflow.api.request_duration_seconds (histogram, tagged route/method/status)
Wired into airflow/api_fastapi/app.py's create_app() and gated on the existing metrics.statsd_on / OTel-enabled flags so operators not on a metrics backend pay no cost.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Description
Airflow 3.0 moved the public API and UI-backing endpoints from Flask to FastAPI (
airflow-core/src/airflow/api_fastapi/). Operators monitoring production deployments need per-route request count, latency, and status distributions to diagnose slow endpoints (Grid TI-summary fan-out, log download, DAG list paging). Today the API server emits process-level metrics but no per-route HTTP timing.The existing OTel metrics infrastructure (
airflow/metrics/otel_logger.py) already supports histograms. Adding a small FastAPI middleware that records request duration withroute,method,statusattributes would close the gap without bringing in a third-party dependency likeprometheus-fastapi-instrumentator.Use case / motivation
/api/v2/dags/{dag_id}/dagRunsbefore users notice — today the only signal is gunicorn worker timeouts.Proposal
Add an opt-in FastAPI middleware that records:
airflow.api.requests_total(counter, tagged route/method/status)airflow.api.request_duration_seconds(histogram, tagged route/method/status)Wired into
airflow/api_fastapi/app.py'screate_app()and gated on the existingmetrics.statsd_on/ OTel-enabled flags so operators not on a metrics backend pay no cost.Related issues
No response
Are you willing to submit a PR?
Code of Conduct