Cache Celery apps when publishing workloads#67127
Conversation
|
Drive-by from #67123 — confirming this matches the fix I proposed there. Both shapes ( Sharing some empirical numbers I gathered while reproducing #67123, in case they're useful for the reviewer: Per-publish overhead, measured as
Reference: a real production deployment with the regression (~682 distributions, ~640 active Dags, periodic scheduler OOMs) shows The takeaway for sizing the fix's expected impact: with this cache in place, the per-subprocess first-publish cost is paid once per team_name and then amortized over the subprocess's lifetime, restoring the pre-3.16.0 behavior. A couple of small things you might want to consider rolling in (entirely optional — happy if you ignore):
Comparison point in case it's useful: PR #61798 implemented the same AIP-67 multi-team feature for Either way, the fix is correct and I'd love to see it merged. Thanks for picking this up so quickly. Drafted-by: Claude Code (Opus 4.7); reviewed by @seanmuth before posting |
| return celery_app | ||
|
|
||
|
|
||
| @lru_cache(maxsize=8) |
There was a problem hiding this comment.
Changed. Replaced the arbitrary maxsize=8 with @cache, so configured teams are cached without eviction in the publishing process.
|
|
||
| @lru_cache(maxsize=8) | ||
| def _get_celery_app_for_workload(team_name: str | None) -> Celery: | ||
| """Return a subprocess-local Celery app cached by team name for task publishing.""" |
There was a problem hiding this comment.
The "subprocess-local" claim only holds when _send_workloads_to_celery actually uses the ProcessPoolExecutor branch. In celery_executor.py:243-245, the single-workload (or sync_parallelism=1) path runs send_workload_to_executor inline via map(...) in the scheduler process. In that case this cache lives in the scheduler itself and keeps the cached Celery apps' broker connections open there too. Worth either updating the docstring to reflect "scheduler process or publisher subprocess, depending on path", or being explicit that this is intentional.
There was a problem hiding this comment.
Fixed. Updated the docstring to describe both scheduler-inline and publisher-subprocess execution paths, and made the cache process-local rather than subprocess-specific.
| def _get_celery_app_for_workload(team_name: str | None) -> Celery: | ||
| """Return a subprocess-local Celery app cached by team name for task publishing.""" | ||
| if TYPE_CHECKING: | ||
| _conf: ExecutorConf | AirflowConfigParser |
There was a problem hiding this comment.
This if TYPE_CHECKING: _conf: ExecutorConf | AirflowConfigParser is a copy-paste leftover from the old send_workload_to_executor. The annotation never escapes this function and _conf already gets a concrete type from the if/else assignment below, so this whole two-line block can be dropped.
There was a problem hiding this comment.
Fixed. Removed the leftover TYPE_CHECKING annotation block from _get_celery_app_for_workload.
| def clear_cached_workload_celery_apps(): | ||
| celery_executor_utils._get_celery_app_for_workload.cache_clear() | ||
| yield | ||
| celery_executor_utils._get_celery_app_for_workload.cache_clear() |
There was a problem hiding this comment.
The two new tests cover team-a reuse and team-a vs team-b separation, but not team_name=None, which is the path the vast majority of deployments hit (no multi-team config). A third parametrized case asserting None also hits the cache would catch a regression where lru_cache started treating None specially.
There was a problem hiding this comment.
Fixed. Added direct cache tests for team_name=None, same-team reuse, and distinct team names.
|
Rolled in the optional follow-ups: documented why the cache exists and where it lives, added direct cache coverage including |
What
send_workload_to_executorinside each publisher subprocessteam_nameso multi-team configurations still get isolated app instancesWhy
send_workload_to_executorcurrently creates a freshCelery()app for every publish. Each fresh app loses Celery's lazy backend cache, soapply_async()repeatedly performs backend resolution and entry point scanning. On large deployments this can push task publishing past[celery] operation_timeout.Caching the app per subprocess preserves the post-AIP-67 behavior of constructing apps inside publisher subprocesses while restoring per-process amortization.
fixes #67123
Tests
uv run --project providers/celery pytest providers/celery/tests/unit/celery/executors/test_celery_executor.py -quv run ruff check providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py providers/celery/tests/unit/celery/executors/test_celery_executor.pyuv run ruff format --check providers/celery/src/airflow/providers/celery/executors/celery_executor_utils.py providers/celery/tests/unit/celery/executors/test_celery_executor.py