Description
The REST API on api-server is unbounded by default. A single user / service account making rapid calls (a CLI in a tight loop, misconfigured automation, a busy DAG generator) can exhaust request capacity and degrade access for the rest of a multi-tenant deployment. There's no built-in mechanism to cap per-user request rate.
For login throttling there's PR #58293 (closed) and the FAB rate-limiter for the legacy webserver — but those are about brute-force auth-page protection, not general API quota.
Use case / motivation
Multi-tenant Airflow deployments. One noisy tenant should not be able to DoS the api-server for everyone else. Today the workarounds are:
- (a) Put a rate-limiter (envoy/nginx) in front of api-server and tag by user — but the username isn't always available at the proxy layer for token-auth flows.
- (b) Cap concurrent gunicorn workers — but that caps total throughput, not per-user.
Neither is satisfying.
Proposal
Per-user rate-limiting middleware on api-server. Sketch:
- Keyed by authenticated principal (username from JWT / session / API token).
- Separate buckets for high-traffic endpoints (
/api/v2/monitor/health, /api/v2/version) vs general API.
- Service-account principals get a higher (configurable) limit.
- Backend: in-memory by default; redis if configured (matches how the FAB rate-limiter is set up).
- Config: per-endpoint or per-route-prefix limits in
airflow.cfg.
This is substantive enough that I'd prefer to discuss the shape on the mailing list before PR. Filing the issue so the discussion has a permanent home.
Are you willing to submit a PR?
Code of Conduct
Description
The REST API on api-server is unbounded by default. A single user / service account making rapid calls (a CLI in a tight loop, misconfigured automation, a busy DAG generator) can exhaust request capacity and degrade access for the rest of a multi-tenant deployment. There's no built-in mechanism to cap per-user request rate.
For login throttling there's PR #58293 (closed) and the FAB rate-limiter for the legacy webserver — but those are about brute-force auth-page protection, not general API quota.
Use case / motivation
Multi-tenant Airflow deployments. One noisy tenant should not be able to DoS the api-server for everyone else. Today the workarounds are:
Neither is satisfying.
Proposal
Per-user rate-limiting middleware on api-server. Sketch:
/api/v2/monitor/health,/api/v2/version) vs general API.airflow.cfg.This is substantive enough that I'd prefer to discuss the shape on the mailing list before PR. Filing the issue so the discussion has a permanent home.
Are you willing to submit a PR?
Code of Conduct