refactor(task-31-a3): description-free graph_curation 7 call sites by earayu · Pull Request #1941 · apecloud/ApeRAG

earayu · 2026-04-30T09:02:06Z

Summary

Wave 5 description-NULL invariant fix for graph_curation: the dedup
detection / scoring / snapshot / accept-apply paths still read
entity.description / compacted_description / description_parts
even though Wave 5 task #5 (facts/vectors split) stopped emitting
description text. Reading these fields would either silently degrade
scoring (always-empty bag-of-tokens) or leak stale fragments from
pre-Wave-5 rows into reviewer-facing suggestions.

This is task #77 (Phase A3) under the task #31 umbrella, per
spec docs/zh-CN/architecture/task-31-graph-node-merge-spec-v1.md § 3.1.5.

Scope

6 detector / snapshot call sites + 1 apply-path variant (spec
§ 3.1.5 enumeration) + 1 service-layer helper surfaced by the
boundary test grep gate:

#	path	fix
1	candidate_generation.py:entity_snapshot	drop description key
2	candidate_generation.py:_lexical_signals	drop description Jaccard overlap
3	candidate_generation.py:_pair_score	drop description scoring weight (dead branch post Wave 5)
4	dto.py:CurationEntity.from_lineage	set description="" instead of deriving from compacted_description / description_parts
5	merge_candidate_detector._description_text_for_scoring → _embedding_query_text	embed `<name> (<entity_type>)` mirror of how graph_vectors worker writes the entity vector (Wave 5 task #5 / #7)
6	merge_candidate_detector._to_legacy_entity	pass description="" instead of reading from entity
7	merge_candidate_detector._snapshot	drop description key from persisted entity_snapshots payload
+1	lineage_merge.merge_entities_apply_description_free (new)	variant for async accept-apply worker; skips LLM unified description / Compactor / curation_merge sentinel write / vector embed write per spec «不调» list
+1	service._fetch_shadow_neighbors	`entity.description or entity.name` → `entity.name` (description always "" post Wave 5, fallback was a no-op)

CurationEntity.description field is kept on the dataclass (default \"\") for back-compat with existing callers; the boundary test gates the read surface, not the field declaration.

lineage_merge.merge_entities (legacy description-bearing variant) is preserved unchanged for manual sync handle_action API back-compat per spec § 3.1.5 («老 lineage_merge.py 路径保留作 manual API 兼容，标 deprecation Lesson #14 multi-iteration cleanup follow-up»).

Boundary gate

tests/boundaries/test_graph_curation_description_free.py (new) — 4 AST-level assertions per spec § 5.2.a:

test_graph_curation_modules_do_not_read_entity_description — full scope aperag/graph_curation/** (excl. lineage_merge.py legacy variant + dto.py field declaration)
test_merge_candidate_detector_does_not_read_entity_description — aperag/indexing/merge_candidate_detector.py
test_lineage_merge_apply_description_free_does_not_read_entity_description — the new variant body must not read description
test_lineage_merge_apply_description_free_does_not_call_llm_or_compactor — the new variant body must not invoke LLM unified description / Compactor pass / vector embed write per spec «不调» list

Allowlist:

lineage_merge.merge_entities (legacy back-compat) excluded by file
dto.py field declaration excluded (annotation, not a read)
LineageMergeResult.compacted_description (non-entity result shape used by legacy sync handle_action API) excluded by base name

Lesson family applied

Lesson feat: socket debug #14 (架构 invariant 删除多轮迭代收尾) — Wave 5 description-NULL invariant 双侧最终收尾, 老 lineage_merge.py 路径保留 deprecation
Lesson feat: auto size for chat input #18 候选 (lesson sediment + mechanical gate 双 layer codification, per huangheng PR docs(cr-checklist): task #61 + task #30 B3 close-out sediment fold-in #1932 + chenyexuan PR test(task-33-p3): cross-source default value gate for graph_extraction_window_size #1933 first-application demo) — lesson 文字 layer (cr-checklist § 四 Wave 5 description-NULL family) + mechanical gate layer (this boundary test) 配对完整 invariant defense — second-application demo
Lesson feat: socket message #13 v3 (cross-source default value alignment) — applied to ensure all 6+1 sites in spec are mechanically locked, no reviewer-as-detector remaining

Test plan

tests/boundaries/test_graph_curation_description_free.py — 4/4 ✅
tests/boundaries/ full sweep — 104/104 ✅
tests/unit_test/graph_curation/ + tests/unit_test/indexing/test_merge_candidate_detector.py — 36/36 ✅
tests/unit_test/ full sweep — 1466 passed / 15 skipped ✅
ruff format + ruff check — clean ✅
CI lint-and-unit
CI e2e-http-smoke (graph flow)

Risk

0 production behavior change for legacy sync API:

lineage_merge.merge_entities preserved unchanged
service.handle_action accept path still calls merge_entities (description-bearing variant)
MergeCandidateDetector recall now uses name + entity_type as embedding query — this fixes a silent zero-recall regression introduced by Wave 5

0 schema change — CurationEntity.description field kept with default ""; no migrations.

CR

@符炫炜 ratify (architect lane — task support pagination #31 spec § 3.1.5 enforcement)
@huangheng cr-checklist Lesson feat: socket debug #14 + feat: auto size for chat input #18 candidate cross-link verify
@bryce A1 path 0 交集 verify (A1 = cli/indexing_worker.py + queue family; A3 = graph_curation/{candidate_generation,dto,lineage_merge,service}.py + indexing/merge_candidate_detector.py + boundary test)
@ziang graph_curation domain-specific verify (Wave 7 §K.12.4 detector + suggestion store)
@huangzhangshu / @冬柏 testing-lane verify (boundary test grep gate + unit test signal change rationale)

Not blocking

task [Features] integrate YuQue as a data source #75 (A1 Bryce — in_review)
task chore: fix embedding timeout and scale the embedding workers #78 (A4 dongdong — in_review)
task feat: support default collections #77 itself becomes in_review once this PR opens
task support pagination #31 Phase B (CR + boundary integration / e2e) — depends on Phase A merged

🤖 Generated with Claude Code

Wave 5 description-NULL invariant (task #31 spec § 3.1.5): graph extractor stopped emitting `description` text post Wave 5 task #5 (facts/vectors split). The dedup detection / scoring / snapshot / accept-apply paths still read `entity.description` / `compacted_description` / `description_parts` and would either silently degrade scoring (always-empty bag-of-tokens) or leak stale fragments from pre-Wave-5 rows into reviewer-facing suggestions. Fix the 6 detector / snapshot call sites + 1 apply path enumerated in the spec, plus 1 service-layer helper surfaced by the boundary test grep gate: 1. candidate_generation.py:38 entity_snapshot — drop description 2. candidate_generation.py:179 _lexical_signals — drop description Jaccard token overlap 3. candidate_generation.py:196 _pair_score — drop description scoring weight (signal no longer emitted; branch is dead) 4. dto.py CurationEntity.from_lineage — set description="" instead of deriving from compacted / description_parts; keep field on the dataclass for back-compat with callers that still pass it 5. merge_candidate_detector._description_text_for_scoring → _embedding_query_text — embed `<name> (<entity_type>)` (mirror of how the graph_vectors worker writes the entity vector, Wave 5 task #5 / #7); the legacy method always short-circuited to "" post Wave 5 so detection produced zero candidates 6. merge_candidate_detector._to_legacy_entity — pass description="" instead of reading from entity 7. merge_candidate_detector._snapshot — drop description key from persisted entity_snapshots payload +1 lineage_merge.py — add merge_entities_apply_description_free variant for the async accept-apply worker (task #31 § 3.1.5). Skips LLM unified description / Compactor pass / __curation_merge__ sentinel description write / vector embed write per the spec «不调» list. Legacy merge_entities path is preserved for manual sync API back-compat (Lesson #14 multi-iteration cleanup follow-up). +1 service._fetch_shadow_neighbors — replace `entity.description or entity.name` with `entity.name`; post Wave 5 the description is always "" so the fallback was a no-op, and reading description here violates the boundary gate. Boundary gate (tests/boundaries/test_graph_curation_description_free.py, 4 AST-level assertions per spec § 5.2.a): - graph_curation_modules_do_not_read_entity_description - merge_candidate_detector_does_not_read_entity_description - lineage_merge_apply_description_free_does_not_read_entity_description - lineage_merge_apply_description_free_does_not_call_llm_or_compactor Allowlist: - lineage_merge.merge_entities (legacy back-compat) excluded by file - dto.py field declaration excluded (annotation, not a read) - LineageMergeResult.compacted_description (non-entity result shape used by legacy sync handle_action API) excluded by base name Wave-5 invariant codify pattern (Lesson #18 candidate, per huangheng PR #1932 + chenyexuan PR #1933 first-application demo): lesson sediment (cr-checklist § 四 Wave 5 description-NULL family) + mechanical gate (this boundary test) — paired so future regressions fail at CI not at review time. Tests: 1466 unit + 104 boundary all green. Risk: 0 production behavior change for legacy sync handle_action API (merge_entities preserved); new accept-apply async path uses the description-free variant exclusively. Spec: docs/zh-CN/architecture/task-31-graph-node-merge-spec-v1.md § 3.1.5 Task: task #77 (Phase A3) under task #31 umbrella Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

earayu · 2026-04-30T09:04:57Z

Testing CR on PR #1941: implementation direction looks right, but one boundary gate gap blocks my LGTM.

Local targeted verification passed:

uv run --extra test pytest tests/boundaries/test_graph_curation_description_free.py tests/unit_test/indexing/test_merge_candidate_detector.py tests/unit_test/graph_curation/test_service.py -q

Result: 19 passed.

BLOCKER — the description-free boundary gate excludes dto.py, so it does not actually protect one of the spec’s 6 call sites.

Spec §3.1.5 explicitly includes aperag/graph_curation/dto.py (CurationEntity.from_lineage) as a description-free call site. The PR fixes it now, but test_graph_curation_modules_do_not_read_entity_description() skips the entire file:

if path.name in {"lineage_merge.py", "dto.py"}:
    continue

That means a future regression like reintroducing entity.compacted_description / entity.description_parts inside CurationEntity.from_lineage() would pass the boundary gate, even though it is one of the locked 6 sites.

The whole-file dto.py allowlist is not needed for the dataclass field declaration: description: str = "" is not an ast.Attribute read, and keyword writes like description="" are also not description reads under the current AST walker. So the minimal fix is to include dto.py in the module scan and only keep lineage_merge.py excluded for the legacy path. If there is a specific dto declaration false positive after that, allowlist that exact AST shape rather than the whole file.

Once dto.py is covered by the gate, my testing-lane blocker should be cleared.

@huangheng

Per @huangheng cr-checklist Lesson #14 + #18 候选 cross-link verify (msg=be330423) — 2 non-blocker NITs on PR #1941 fix-forwarded: NIT 1 (service.py:244 deprecation marker): Add deprecation comment on the legacy sync ``handle_action()`` API return-shape line that reads ``merge_result.compacted_description``. Aligns with Lesson #14 «老 path 保留 + 标 deprecation» pattern (matches the ``lineage_merge.merge_entities`` deprecation marker added by the main commit), and explicitly cross-links the boundary test allowlist mechanism (``NON_ENTITY_BASE_NAMES``) so future grep-based audits don't dispatch on the read. NIT 2 (boundary test docstring bonus catch cross-link): Add explicit Lesson #18 候选 second-application demo trail in ``tests/boundaries/test_graph_curation_description_free.py`` module docstring — cite the ``service.py:845`` bonus catch (``text = entity.description or entity.name`` inside ``GraphCurationService._fetch_shadow_neighbors``) as canonical proof of the «lesson sediment + mechanical gate 双 layer codification» value. The spec § 3.1.5 ratify (符炫炜 + Bryce + ziang + huangzhangshu + Weston multi-source review) listed exactly 6+1 sites and every reviewer + spec author missed this 7th hidden read; the boundary gate caught it on first run, turning ``reviewer-as-detector`` into ``CI-as-detector`` per the Lesson #18 thesis. 0 production code change beyond comment / docstring text. Tests: 4/4 boundary test pass + ruff format / check clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

earayu

ziang graph_curation domain verify ✅ LGTM.

Verified:

CurationEntity.from_lineage now hard-sets description="", and candidate_generation no longer emits/reads description in snapshot, lexical signals, or scoring.
MergeCandidateDetector no longer depends on description-derived text; _embedding_query_text(name + type) fixes the post-Wave-5 zero-recall path, and _snapshot no longer persists description into suggestion payloads.
LineageEntityMerger.merge_entities_apply_description_free() is correctly isolated as the async apply variant: alias rows + lineage re-anchor + source L1 delete only, no LLM, no compactor, no __curation_merge__, no vector write/delete. Legacy merge_entities() remains back-compat only.
The AST boundary gate is useful and already proved value by catching service.py hidden read; targeted tests passed locally:
uv run --extra test pytest tests/boundaries/test_graph_curation_description_free.py tests/unit_test/indexing/test_merge_candidate_detector.py -q → 15 passed.

Non-blocking NITs:

merge_candidate_detector.py module docstring still says it embeds “the description”; service._fetch_shadow_neighbors() docstring still says vector-recall “by entity description”. The implementation is fixed, but those docstrings should be updated when convenient so future readers do not reintroduce the old path.
Sync detector query text is name + entity_type, while full-sweep _fetch_shadow_neighbors() uses entity.name only. This is safe for A3 and not a merge blocker, but Phase B should either align the two recall query shapes or explicitly document why full sweep intentionally stays name-only.

earayu · 2026-04-30T09:07:04Z

Update to my previous domain-verify comment: I agree with @huangzhangshu's boundary-gate blocker. The implementation sites are fine, but CurationEntity.from_lineage() is one of the explicit spec §3.1.5 call sites, so the mechanical gate should not whole-file skip dto.py. Please include dto.py in the AST scan and only precise-allowlist the dataclass field declaration / description="" write shape if needed.

After that fix-forward, my graph_curation domain LGTM can stand.

@ziang

Per @huangzhangshu BLOCKER (PR #1941 testing-lane CR, msg=2deb5407) + @ziang second-source ratify (msg=f485803c) + @不穷 PM dispatch (msg=a6cd42c9): the boundary gate ``test_graph_curation_modules_do_not_read_entity_description`` was whole-file excluding ``aperag/graph_curation/dto.py`` to avoid flagging the dataclass field declaration. But spec § 3.1.5 item 4 explicitly lists ``CurationEntity.from_lineage`` as one of the 6 description-free call sites, so the gate must catch future regressions that re-introduce ``entity.compacted_description`` / ``entity.description_parts`` reads inside ``from_lineage``. The whole-file exclusion was a false-positive prevention that turned out to be unnecessary: the AST walker matches ``ast.Attribute`` reads only, and dataclass field annotations (``description: str = ""``) are ``ast.AnnAssign`` nodes with ``target=ast.Name``, while constructor keyword args (``cls(description="")``) are ``ast.keyword`` nodes — neither is an ``ast.Attribute`` access on an entity object. Drops the whole-file exclusion and adds two reinforcing sister-tests so future maintainers do not regress this: * ``test_dto_module_is_in_boundary_scope`` — synthetic-AST positive control: feeds a fake ``from_lineage`` body that reads ``entity.compacted_description`` through the same offender detector and asserts the offender is surfaced. If a future refactor breaks the AST walker, this test catches the silent protection-loss. * ``test_dto_field_declaration_is_not_a_false_positive`` — live negative control: confirms the production ``dto.py`` produces zero offenders, with a docstring directing future maintainers to fix the walker (NOT re-allowlist the file) if a false- positive is ever observed. 6/6 boundary tests pass + ruff format / check clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

earayu

Final pass after 8116639 ✅

The boundary blocker is fixed: dto.py is now in scope, lineage_merge.py remains the only whole-file legacy exclude, and the new positive/negative controls correctly protect the gate itself:

synthetic entity.compacted_description read is detected;
live dto.py field declaration / description="" constructor write do not false-positive.

Targeted verification:
uv run --extra test pytest tests/boundaries/test_graph_curation_description_free.py -q → 6 passed.

My earlier graph_curation domain LGTM now stands. The remaining two NITs from my first comment are non-blocking doc/alignment follow-ups.

earayu · 2026-04-30T09:09:12Z

Testing final pass on PR #1941 at 8116639: LGTM from my lane.

The blocker is fixed:

dto.py is now included in test_graph_curation_modules_do_not_read_entity_description(); only lineage_merge.py remains whole-file excluded for the legacy path.
test_dto_module_is_in_boundary_scope adds a positive control for a synthetic entity.compacted_description regression, so the gate now proves it would catch the exact CurationEntity.from_lineage class of regression I was worried about.
test_dto_field_declaration_is_not_a_false_positive pins the negative control: the live dataclass field declaration / description="" construction does not force a whole-file allowlist.

Local targeted verification:

uv run --extra test pytest tests/boundaries/test_graph_curation_description_free.py tests/unit_test/indexing/test_merge_candidate_detector.py tests/unit_test/graph_curation/test_service.py -q

Result: 21 passed.

git diff --check origin/main...HEAD also passed.

@cuiwenbo

task #31 Phase A4 implementation (dongdong) - FE merge suggestion UI + dismiss action. ## Scope - backend: dismiss action support (schemas.py + service.py:handle_action dismiss branch + 三 success return 补 message field per Weston BLOCKER msg=c1595745) - FE typed schema sync: SUGGESTION_ACTIONS 加 dismiss + MergeSuggestionStatus 7 active + 3 legacy union - FE rendering: 7-state UI display (4 new APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED + DISMISSED + legacy ACCEPTED/EXPIRED/SUPERSEDED) + dismiss button + canonical fields 切换 + panel-open implicit run → read-only list + 显式触发扫描 button - FE-derived P1 fields (per architect lock msg=80054596 simple-stable + 不扩后端): observed_types/type_conflict/affected_doc_count/suggested_entity_type - 顺手 catch fix: 删 target_entity_data extension (backend extra=forbid 422 兼容 bug) - legacy ACCEPTED label "Applied (legacy)" / "已应用 (历史)" semantic alignment (per cuiwenbo NIT 1 + spec § 3.1.2 line 131) - test_suggestion_action_response_requires_valid_success_shapes 单测覆盖 3 path (per huangzhangshu建议 + future schemas.py field add 漂移防护) ## CR collected (4/4 LGTM) - @符炫炜 architect ratify ✅ msg=36a0bbe4 + BLOCKER condition met msg=44813b59 - @cuiwenbo CR pair final final pass ✅ msg=9e503bda - @huangzhangshu testing final pass ✅ msg=bf233776 - @weston architect cross-CR final pass ✅ msg=0fe380bf ## Architect own-up sediment - SuggestionActionResponse.message required field gap = 第二个 architect ratify trust-framing miss (Weston catch via first-principles trace) - Lesson #12 v9 fifth-application demo - mini-pattern 19 升级范围: spec → impl 边界 + impl → response_model contract 边界 + impl catch path → upstream raise points 边界 ## CI - lint-and-unit ✅ - e2e-http-smoke 3/3 ✅ - provider-preflight 3/3 ✅ - e2e-http-provider 3/3 ✅ (re-run after JSON error injected into SSE stream flake confirmed - cross-PR same signature with PR #1941 A3 = systematic flake per ci-flake-policy § 2.2 single-shape signature waiver) 🤖 Architect ratify by Claude Code

#1943) * docs(cr-checklist): task #31 Phase A 全闭环后 sediment fold-in 子 PR 2 § 四加 6 lesson sediment（task #31 Phase A 4 PR + task #33 P3 PR #1933 codify 累计实证 + multi-PR same-hour multi-source first-principles catch trust-framing miss）+ § 六 sediment 引用追加 5 PR commit cross-link + § 八修订记录追加本 PR fold-in 完整 trail。新增 lesson: - Lesson #12 v9 third + fourth + fifth-application demos (PR #1935 ziang DISMISSED enum impl-side catch + dongdong response_model legacy field filter BLOCKER 双 same-PR / PR #1938 Weston worker fail-safe BLOCKER upstream raise points trace / PR #1940 Weston SuggestionActionResponse.message required field catch) — sediment 升级 systemic 信号 reviewer chain 必独立 first- principles re-verify - Migration chain 时序 second-application demo (PR #1935 复用 table extend pattern 跟 PR #1910 新建 enum hard-cut migration 时序约束不同; 5 new enum value APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED/DISMISSED + evidence_refs JSON column + ACCEPTED legacy zero-write grep gate) - Lesson #17 second-application demo (PR #1935 backend 收敛 canonical contract 时同 PR fold-in legacy projection layer 保 backward-compat - suggestion_ batch_id=run_id alias 等 - 跟 deprecation marker Lesson #14 family 配) - Lesson #18 formally established: lesson sediment + mechanical gate 双 layer codification 「一记一 enforce」(first-app PR #1933 4-source default value parity / second-app PR #1941 description-free read scope + service.py:845 bonus catch / third-app PR #1941 fix-forward sister tests 防 whole-file exclude 静默削弱 gate) - mini-pattern 19: spec lock pre-check grep main 实证 enum/contract assumption (architect own-up 升级版三层: spec→impl / impl→response_model / impl catch path→upstream raise points) - mini-pattern 20: PR adds response_model wire-up 必跑 model_validate(actual_ handler_return_shape) boundary gate (PR #1940 first-application demo) per architect dispatch msg=b6726ac9 + msg=420ca548 sediment trigger A 满足 (task #31 Phase A 4/4 done) 启动 + Phase B B1 lane huangheng owner. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(cr-checklist): fix cite accuracy NIT per Weston msg=7690b723 2 cite accuracy fixes (Weston framing CR catch): 1. response_model validation failure 状态码: 422 -> 500 - response_model validation fails 抛 FastAPI ResponseValidationError - 通常映射到 HTTP 500，不是 request body 校验的 422 - 影响 line 745 + line 850 描述 PR #1940 BLOCKER 时的状态码引用 2. GraphMergeSuggestionItem canonical schema 字段实证修正 - 原写: ... / observed_types / type_conflict / suggested_entity_type - 实际 main aperag/domains/knowledge_graph/schemas.py::GraphMergeSuggestionItem 不含这三字段 - A4 (PR #1940) 这些字段是 FE-derived display (FE 从 entities / suggested_target_entity / evidence_refs 推导)，不是 PR #1935 backend projection - 影响 line 781 sect 4 Lesson #17 second-application demo 描述 per Weston PR #1943 framing CR (msg=7690b723) - sediment cite accuracy 要求把事实漂移修干净，避免 future onboarding reference 时 confuse 422/500 状态码语义 + backend/FE field source attribution。不阻塞 main fold-in scope - 6 lesson sediment + 5 PR commit cross-link 其他 framing 全 accurate (Weston verified)。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ty matrix (#1949) * feat(collection): task #61 P1-D3 — vector backend identity + capability matrix Project the deployment-wide ``settings.vector_db_type`` onto every collection detail read so the FE can render a "what does this vector backend actually support" panel without per-collection migration or runtime probe. Backend (output-only projection): - ``aperag/schema/common.py``: ``VectorBackendCapabilities`` + ``VectorBackendInfo`` + ``_STATIC_VECTOR_BACKEND_CAPABILITIES`` dict + ``project_vector_backend_info()`` helper. - ``aperag/domains/knowledge_base/schemas.py:Collection``: add ``vector_backend: Optional[VectorBackendInfo]``. **Intentionally NOT on ``CollectionConfig``** so the OpenAPI ``CollectionCreate`` / ``CollectionUpdate`` input shapes do not let callers mistake a deployment-wide setting for a per-collection editable knob (per dongdong msg=c2593fdd + PM msg=caf7e4df + architect msg=0044261f read-only projection lock). - ``aperag/domains/knowledge_base/service/collection_service.py``: populate ``vector_backend`` in ``build_collection_response`` from ``settings.vector_db_type``; ``None`` for unknown backends so the FE can render a placeholder without a hard failure. Cross-PR consistency with task #83 / PR #1948 (Bryce, vector adapter behavior fixes): - Bryce's connector-layer ``BACKEND_CAPABILITIES`` ClassVar declares 2 truth flags (``supports_atomic_batch_upsert`` + ``supports_legacy_mode``); this PR's schema-layer Pydantic model mirrors those values plus a 3rd schema-layer-only flag ``supports_filter_or_with_empty_parts`` which is uniformly False across adapters after task #83 P1-V3 (translator-level defense-in-depth rejects empty Or parts). - The 3rd flag stays in the schema so the FE can declare the uniform reject explicitly per spec § 2.3 P1-D3 「显示『允许差异但显式』」 — Lesson #17 backend 收敛 contract simple-stable family pattern (cite PR #1930 SearchHit normalize, PR #1935 GraphMergeSuggestionItem projection layer). Mechanical gate (per Lesson #18 lesson-sediment + mechanical-gate 双 layer codification — first established by chenyexuan PR #1933 / PR #1941, then PR #1940 ``model_validate`` boundary): 13-case unit suite in ``tests/unit_test/contracts/test_vector_backend_capability_matrix.py`` pins each capability flag, normalizes inputs, and round-trips Pydantic ``model_dump`` so future drift between schema, projection helper, and FE-consumed shape fails fast at unit-test time. FE (read-only display): - ``web/src/features/collection/types.ts``: typed mirrors ``VectorBackendInfo`` / ``VectorBackendCapabilities`` / ``VectorBackendType``. - ``web/src/app/workspace/collections/[collectionId]/settings/collection-vector-backend-card.tsx``: new component that surfaces backend identity + capability matrix in the collection settings page (above the edit form). dongdong picks up rendering polish (responsive + dark mode + final copy) on the same PR per the joint A4-style split (cuiwenbo contract layer + dongdong rendering polish + CR pair). - ``web/src/i18n/{en-US,zh-CN}/page_collections.json``: copy strings. - ``web/src/api-v2/schema.d.ts`` regenerated via ``yarn api:v2:types``. Local verification: - ``uv run --extra test pytest tests/unit_test/contracts/test_vector_backend_capability_matrix.py tests/unit_test/contracts/test_collection_v2_openapi_contract.py -q`` → 23 passed - ``make openapi-check`` → ok - ``yarn type-check --pretty false`` → 0 new errors on this PR's files (pre-existing graph-lab cosmograph + agent-runtime errors unchanged) - ``yarn lint --quiet`` → 0 warnings/errors - ``yarn i18n:check`` → ok - ``git diff --check`` → ok Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(collection): task #87 P1-D3 — convert vector_backend to computed_field Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): consolidate vector_backend_capability_matrix imports for ruff Combine the two from aperag.schema.common import ... statements into a single block so ruff's import organization rule is satisfied. No code-behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): apply ruff format to vector_backend test + common.py Run `uv run ruff format` on ApeRAG/aperag/schema/common.py and ApeRAG/tests/unit_test/contracts/test_vector_backend_capability_matrix.py so `make lint` (`ruff format --check`) passes. Pure formatting; no behavior change. Other unrelated files reverted to keep this PR scope clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

earayu commented Apr 30, 2026

View reviewed changes

earayu merged commit 0535bf4 into main Apr 30, 2026
12 of 13 checks passed

earayu deleted the chenyexuan/task-31-a3-description-free branch April 30, 2026 09:20

This was referenced Apr 30, 2026

docs(task-31): spec v1.1 amend — Phase A 实施 sediment fold-in #1942

Merged

docs(cr-checklist): task #31 Phase A close-out sediment fold-in 子 PR 2 #1943

Merged

earayu mentioned this pull request Apr 30, 2026

feat(collection): task #61 P1-D3 — vector backend identity + capability matrix #1949

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(task-31-a3): description-free graph_curation 7 call sites#1941

refactor(task-31-a3): description-free graph_curation 7 call sites#1941
earayu merged 3 commits into
mainfrom
chenyexuan/task-31-a3-description-free

earayu commented Apr 30, 2026

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu left a comment

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu left a comment

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 30, 2026

Summary

Scope

Boundary gate

Lesson family applied

Test plan

Risk

CR

Not blocking

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu left a comment

Choose a reason for hiding this comment

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu left a comment

Choose a reason for hiding this comment

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant