Fix N+1 query pattern in bulk pool delete endpoint#66222
Conversation
SameerMesiah97
left a comment
There was a problem hiding this comment.
Solid performance improvement.
|
I’ve noted that Once this logic is confirmed and merged, I will follow up with another PR to address the remaining functions. Thanks again to all the reviewers! |
|
Thanks for the suggestion @jason810496 — I've added a test that uses Additionally, while checking this unit test, I realized that using round-trip times provides a more accurate evaluation of the impact. I have updated the table below to reflect this:
If you have any feedback or suggestions for this PR, feel free to let me know! Thanks to all the reviewers for your support! |
henry3260
left a comment
There was a problem hiding this comment.
Thanks for your effort!
f7055e7 to
1a6ae94
Compare
henry3260
left a comment
There was a problem hiding this comment.
Thanks! I think we are almost there
1a6ae94 to
6ded2bf
Compare
jason810496
left a comment
There was a problem hiding this comment.
LGTM overall, one final nit before merge. Thanks.
jason810496
left a comment
There was a problem hiding this comment.
Would it be better to simplify as accepting single pool_count? Since the expected count should be the same.
|
Thanks for the feedback @jason810496! Refactored the code based on your suggestion. I initially over-engineered it to avoid the hardcoded query count. Feel free to let me know if there’s anything else or if I misunderstood anything. |
7a4cc2d to
69fc441
Compare
|
Fixed ruff formatting to pass CI (force-pushed to the latest commit). |
|
@ColtenOuO — There are 4 unresolved review threads on this PR from @jason810496. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? Once you believe the feedback is addressed, mark the thread as resolved so the reviewer isn't re-pinged needlessly. Thanks! Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
Thanks for the fix. |
Backport successfully created: v3-2-testNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
|
Summary
Eliminates an N+1 query in the bulk pool delete endpoint by reusing the ORM objects already loaded by
BulkPoolService.categorize_pools()instead of re-fetching each pool one at a time inside the delete loop.Why
categorize_pools()already loads every requested pool with a single batchedSELECT ... WHERE pool IN (...)and returns them asexisting_pools_dict: dict[str, Pool].handle_bulk_delete()was discarding that dict (_, matched, not_found = ...) and then re-querying each pool individually inside the loop:For a request that deletes N pools, this produced 2N + 1 round-trips to the database (1 batched SELECT in categorize_pools + N redundant per-pool SELECTs + N DELETEs). The redundant per-pool SELECTs return objects we already have in memory.
Change 1
Behavior is preserved exactly:
existing_pools_dict.get(pool_name)returns the samePoolinstance the per-pool SELECT would have returned (it's the same object loaded bycategorize_poolsin the same session), and isNonewhen the pool doesn't exist — matching the originalsession.scalar(...).limit(1)semantics.Change 2
Added a unit test that asserts the expected query count.
Impact
Unit Test Results
To ensure this optimization doesn't introduce any regressions, I've verified the changes with the existing test suite. All tests passed.
Command:
uv run --project airflow-core pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_pools.py::TestBulkPools -xWas generative AI tooling used to co-author this PR?
Yes — (Claude Sonnet 4.6, for
test_bulk_delete_does_not_re_query_each_poolin airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_pools.py)