feat(graph_curation): task #61 P2-S1+S2 — batch alias resolution#1950
Merged
Conversation
…seed PG connection saturation fix Closes task #88 per PM @不穷 dispatch (msg=8f130f25). Implements task #61 spec v1 § 2.4 P2-S1 + Planetegg P2-HIGH (msg=db7fb085 + msg=1314ac59) + Singapore SRE diagnostic (Planetegg msg=4043adf4) batch alias resolution. Background ---------- ``LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops`` is on the ``GET /api/v2/collections/{id}/graphs`` and ``/graphs/hybrid`` read paths. Pre-fix, it called ``AliasMapRepository.resolve_canonical`` once per anchor name via ``asyncio.gather``, which checks out one PG connection per name. Spec § 2.4 P2-S1 quantification: * ``GET /graphs?max_nodes=1000`` → up to **2 × max_nodes = 2000** seeds. * ``GET /graphs/hybrid``: default 1000 / max 5000 seeds. At those cardinalities the PG connection pool saturates, observed in Singapore production (Planetegg msg=4043adf4 SRE diagnostic). Changes ------- * ``aperag/graph_curation/alias_map.py``: new :meth:`AliasMapRepository.resolve_canonical_many` batch primitive. Single SQL ``SELECT alias_name, canonical_name FROM aperag_lineage_entity_alias WHERE collection_id=? AND alias_name IN (...)`` reads all matching rows in one shot. Names absent from the result set fall back to themselves (mirrors single-name ``resolve_canonical`` semantics). Empty / falsy names short- circuit without an SQL lookup. Total connections checked out: **1** per call regardless of seed count. Caller order is preserved on the dict iteration order (insertion order semantics). * ``aperag/indexing/alias_redirect_store.py``: rewrite ``LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops`` to use the batch primitive. ``asyncio.gather`` per-name fan-out gone; ``import asyncio`` no longer needed at module top-level. * Test stub ``_FakeAliasRepo`` in ``tests/unit_test/indexing/test_alias_redirect_store.py``: now implements both ``resolve_canonical`` (single, used by upsert/get/delete redirect paths) and ``resolve_canonical_many`` (batch, used by expand) + tracks call counts so tests can pin the call-graph (i.e. expand path goes through batch primitive exactly once). Tests ----- * ``tests/unit_test/graph_curation/test_alias_map.py`` (7 new): - ``test_resolve_canonical_many_returns_self_for_unmapped_names`` - ``test_resolve_canonical_many_mixed_alias_and_canonical`` - ``test_resolve_canonical_many_dedupes_input`` - ``test_resolve_canonical_many_empty_input`` - ``test_resolve_canonical_many_handles_empty_string`` - ``test_resolve_canonical_many_per_collection_isolation`` - ``test_resolve_canonical_many_large_seed_cap`` (2000-name spec quantification — pinned correctness at the spec-cap so a future regression that re-introduces per-name fan-out either times out or breaks the result shape). * ``tests/unit_test/indexing/test_alias_redirect_store.py`` (2 new): - ``test_expand_neighbors_uses_batch_alias_resolution`` — call-graph gate: exactly 1 ``resolve_canonical_many`` call, zero ``resolve_canonical`` calls, regardless of seed count. A regression that re-introduces the gather pattern is caught immediately. - ``test_expand_neighbors_large_seed_cap_uses_single_batch_call`` — 2000-seed spec-cap pinned at the call-graph level. Local: ``uv run pytest tests/unit_test/graph_curation/ tests/unit_test/indexing/test_alias_redirect_store.py`` → **56 passed, 1 warning**. Spec / scope alignment ---------------------- * task #61 spec v1 § 2.4 P2-S1 — batch resolve primitive ✅ * task #61 spec v1 § 2.4 P2-S2 — ``expand_neighbors_n_hops`` seed cap test ✅ * Lesson #17 backend 收敛 contract — single primitive replaces N- parallel fan-out at the same caller, no FE / API changes required ✅ * Lesson #18 mechanical gate codification — call-graph assertion in the redirect-store test is the mechanical gate (caught by CI on any future regression that bypasses the batch primitive) ✅ Follow-ups (NOT in this PR) --------------------------- * P3 cross-cutting concern: every ``LineageGraphStore`` consumer that currently invokes the alias path per-name (e.g. some GraphCurationService internals) should migrate to the batch primitive — independent task gated on production data showing the residual N-fan-out is a real bottleneck. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collaborator
Author
|
Testing CR LGTM ✅ I reviewed the batch alias-resolution path against task #61 P2-S1/S2 and the Singapore PG connection fan-out failure mode. What I checked:
Local validation: uv run --extra test pytest tests/unit_test/graph_curation/test_alias_map.py tests/unit_test/indexing/test_alias_redirect_store.py -q
# 35 passed
uv run ruff check aperag/graph_curation/alias_map.py aperag/indexing/alias_redirect_store.py tests/unit_test/graph_curation/test_alias_map.py tests/unit_test/indexing/test_alias_redirect_store.py
# All checks passed
git diff --check origin/main...HEAD
# passedNo testing blocker from my lane. CI green / accepted flake waiver + remaining architecture/SRE checks are enough to merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes task #88 per PM @不穷 dispatch (msg=8f130f25). Implements task #61 spec v1 § 2.4 P2-S1 + P2-S2: batch alias resolution to fix the Singapore PG connection saturation observed by Planetegg (msg=4043adf4 SRE diagnostic).
Changes
AliasMapRepository.resolve_canonical_many(new) — single SQLSELECT ... WHERE alias_name IN (...)roundtrip; total PG connections per call: 1 regardless of seed count. Caller order preserved (dict insertion-order semantics).LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops— rewritten to use the batch primitive.asyncio.gatherper-name fan-out removed;import asynciono longer needed._FakeAliasRepoextended withresolve_canonical_many+ call-count tracking so tests can pin the call-graph.Background
Pre-fix,
expand_neighbors_n_hopschecked out one PG connection per anchor name viaasyncio.gather. At spec-cap seed cardinalities (GET /graphs?max_nodes=1000→ 2000 seeds;/graphs/hybrid→ up to 5000 seeds) the PG connection pool saturates and unrelated API requests stall.Tests
tests/unit_test/graph_curation/test_alias_map.py(7 new): unmapped names self-resolve / mixed alias+canonical / input dedupe / empty input / empty string / per-collection isolation / 2000-seed spec-cap correctness.tests/unit_test/indexing/test_alias_redirect_store.py(2 new): call-graph gate (exactly 1resolve_canonical_manycall + 0resolve_canonicalcalls) at 5-seed and 2000-seed spec-cap.Local:
uv run pytest tests/unit_test/graph_curation/ tests/unit_test/indexing/test_alias_redirect_store.py→ 56 passed.Test plan
ruff check+ruff format --checkcleanSpec / scope alignment
Follow-up (NOT in this PR)
P3 cross-cutting: other
LineageGraphStoreconsumers (e.g. someGraphCurationServiceinternals) that still invoke per-name alias resolution should migrate to the batch primitive — gated on production data showing residual fan-out is a real bottleneck.🤖 Generated with Claude Code