test(task-33-p3): cross-source default value gate for graph_extraction_window_size by earayu · Pull Request #1933 · apecloud/ApeRAG

earayu · 2026-04-30T06:20:23Z

Summary

Codify Lesson feat: socket message #13 v3 (cross-source default value alignment) as a cicd-push.yml lint+unit gate for graph_extraction_window_size
Locks 4 sources in sync: Python const / Pydantic Field examples / TS schema @example / spec doc § 3.1.1 + § 4.2
1 file / +251 / 0 deletion / 0 production code change / 0 risk

Why this gate exists (concrete drift evidence)

PR #1925 (task #30 B3, default=2 lock) surfaced the cross-source drift class:

Weston msg=1b7d9bef BLOCKER 1 — caught web/src/api-v2/schema.d.ts still carrying default 1 after Python + Pydantic moved to 2
huangheng msg=bf785b12 NIT 1 — caught spec § 3.1.1 line 85 still saying default 1
Both required fix-forward commit dae43f5

This gate replaces "reviewers act as drift detectors" with "lint+unit catches drift mechanically before review burden".

Sediment cross-link

This PR is the codified mechanical counterpart to PR #1932 (@huangheng) § 四 Lesson #13 v3 application demo 2 + Lesson #14 application demo. PR #1932 records the drift class as a CR-checklist lesson; this PR enforces it so the lesson does not have to be remembered.

Per docs/zh-CN/architecture/task-17-cr-review-checklist.md § 四 Lesson #13 v3.

Sources locked

#	Source	What's pinned
1	`aperag/indexing/graph_extractor.py`	`_DEFAULT_GRAPH_EXTRACTION_WINDOW_SIZE` (Python const, runtime fallback)
2	`aperag/schema/common.py`	`KnowledgeGraphConfig.graph_extraction_window_size` `Field(examples=[N])`
3	`web/src/api-v2/schema.d.ts`	JSDoc `@example N` (frontend client surface)
4	`docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md`	§ 3.1.1 `B3 lock default \`N``+ § 4.2``graph_extraction_window_size = N``

Tests added (5)

test_graph_extraction_window_size_default_consistent_across_sources — main gate (4 sources must agree)
test_graph_extraction_window_size_default_is_positive_integer — sanity (window math requires >= 1)
3-5. test_individual_source_extractor_does_not_raise[python_const|pydantic_examples|ts_schema_example] — separates "extractor broken" failures from "values drifted" failures so operator immediately knows whether to fix test infra or schema

Failure semantics

When the gate trips, it spells out all 4 observed values per source + the procedural reminder (>= 10 samples + >= 3 models 同时不退步 + PM + architect + earayu2 三方 confirm per spec § 4.2). Operator does not have to guess which source drifted.

Why a unit test, not a boundary test or new workflow

tests/boundaries/ is not currently invoked by make test-unit / test-integration / cicd-push.yml (task chore: refactor sources #33 Layer 1 audit finding — separate Layer 2 follow-up if needed)
tests/unit_test/ runs on every push via make test-unit
Per simple-stable directive (earayu2 msg=1224bec8), cheapest reliable gate = unit test in existing CI lane, not a new workflow file
Co-located with existing tests/unit_test/contracts/ cross-source contract tests

Scope discipline (per earayu2 sticky 4)

This gate pins default value parity only. It does NOT pin:

description text (evolves)
override-recommendation phrasing (evolves)
rationale wording (evolves)

If a future change moves the default away from 2, the test fails; fix is to update all 4 sources in the same PR.

Test plan

pytest tests/unit_test/contracts/test_graph_extraction_window_size_default_consistency.py -v -> 5/5 pass
Synthetic drift on Python const (1 vs 2) -> caught with clear message
Synthetic drift on TS schema @example -> caught
Synthetic drift on spec § 3.1.1 -> caught
tests/unit_test/contracts/ full suite -> 58/58 pass (no collateral break)
ruff format --check clean
ruff check clean
CI lint-and-unit (verify on push)

CR

@符炫炜 ratify (架构师 lane).
@huangheng / @weston / @冬柏 / @huangzhangshu / @Planetegg — 任意 reviewer LGTM 即可 author squash merge.

Does not block:

@huangheng PR docs(cr-checklist): task #61 + task #30 B3 close-out sediment fold-in #1932 (sediment fold-in, separate lane)
@符炫炜 task support pagination #31 spec v1 (architect lane, 3 BLOCKER fix-forward in flight)
task [Features] add API /api/v1/kbclusters to list KubeBlocks clusters #61 P1/P2 implementation queue
task fix: timestamp change #11 GC orphan vector follow-up

🤖 Generated with Claude Code

…n_window_size Codify Lesson #13 v3 (cross-source default value alignment) as a CI unit test gate so future task-#30 B3-class drift is caught by ``cicd-push.yml`` lint+unit instead of by reviewers via fix-forward rounds. Background — task #30 B3 (PR #1925, merge ``43648f9``) locked ``graph_extraction_window_size`` default to ``2`` across **four** sources that all need to agree: 1. ``aperag/indexing/graph_extractor.py`` ``_DEFAULT_GRAPH_EXTRACTION_WINDOW_SIZE`` (Python const, runtime fallback) 2. ``aperag/schema/common.py`` ``KnowledgeGraphConfig.graph_extraction_window_size`` Pydantic ``Field(examples=[N])`` (OpenAPI / TS schema source) 3. ``web/src/api-v2/schema.d.ts`` JSDoc ``@example N`` (frontend client surface — committed to repo, can drift if regen skipped) 4. ``docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md`` § 3.1.1 line 85 ``**B3 lock default `N`**`` + § 4.2 ``**`graph_extraction_window_size = N`**`` (architectural source of truth that PRs CR against) PR #1925 itself surfaced the drift class: - Weston ``msg=1b7d9bef`` BLOCKER 1 caught ``schema.d.ts`` still carrying default ``1`` - huangheng ``msg=bf785b12`` NIT 1 caught § 3.1.1 line 85 still saying default ``1`` Both required a fix-forward commit (``dae43f5``). Why a unit test (not a boundary test): ``tests/boundaries/`` is not currently invoked by ``make test-unit`` / ``test-integration`` / ``cicd-push.yml`` (task #33 Layer 1 audit finding). ``tests/unit_test/`` runs on every push via ``make test-unit``. Per simple-stable directive (earayu2 ``msg=1224bec8``), the cheapest reliable gate is a unit test in the existing CI lane, not a new workflow file. Scope discipline: pins **default value parity** across four sources only. Does not pin description text, override-recommendation phrasing, or rationale wording. If a future change moves the default away from 2, the test fails with a list of all observed values per source plus the procedural reminder (``≥10 samples + ≥3 models 同时不退步 + PM + architect + earayu2 三方 confirm``). Tests: - ``test_graph_extraction_window_size_default_consistent_across_sources`` — the main gate (asserts all 4 sources agree) - ``test_graph_extraction_window_size_default_is_positive_integer`` — sanity (window assembler math requires ``>= 1``) - ``test_individual_source_extractor_does_not_raise[*]`` — separates "extractor broken" failures from "values drifted" failures so operator immediately knows whether to fix test infra or schema Local validation: - 5/5 pass in clean state - Synthetic drift on each of (Python const / TS schema / spec § 3.1.1 / spec § 4.2) caught with clear actionable error message naming the drifting source - Full ``tests/unit_test/contracts/`` 58/58 pass - ruff format + ruff check clean Sediment cross-link: this gate is the codified counterpart to huangheng PR #1932 § 四 Lesson #13 v3 application demo 2 + Lesson #14 application demo (PR #1925 § 3.1.1 multi-iteration cleanup) — that PR records the drift class as a CR-checklist lesson; this PR enforces it mechanically so the lesson does not have to be remembered. task #33 Layer 2 P3 (chenyexuan claim, in_progress) per PM dispatch ``msg=65465f9e``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…scope lock 5 BLOCKER cleanup (per ziang/Weston/Bryce/huangzhangshu/Planetegg/dongdong/huangheng converge): 1. lane name `graph_node_merge_suggestion` -> `graph_curation_run` 全文统一 (§2.1/§4 A1/§4 B3/§5.1/§5.2 + 测试文件名 `test_graph_curation_run_boundaries.py` + 删 11th 数字硬码 + 删 `_dedup_scan` 旧实现名) 2. 新 `merge_suggestion` table 残留全清 (§2.2 P1-B + §3.1.2 + §6 Migration chain) -> 全文统一「复用 GraphCurationSuggestion + extend status enum 4 新 value + 加 evidence_refs field」 3. §3.1.4 `LineageGraphStore.merge_entities` (不存在) -> `LineageEntityMerger` description-free apply path 基于 LineageGraphStore primitives + cross-backend boundary test 钉 LineageEntityMerger 行为 4. Phase A1 worker `MergeSuggestionWorker` 调 `MergeCandidateDetector` -> 区分 manual/cron full sweep 调 `generate_graph_curation_run_task` integration path vs auto_post_ingest sync inline `detect_for_sync()` quick path 5. description-free 4 处 -> 6 处 detector/snapshot call site enumerate (补 `merge_candidate_detector.py:322-328` + 修正 `:263-271` -> `:257-284`，跟 §3.1.5 align) + 1 个 apply path §5 gate 拆 5.2.a scan-generation invariants (lane symbolic dual-side / independent queue family / description-free 6 call sites / trigger split / safe-only write) vs 5.2.b async accept-apply 状态机 invariants (7-state machine / enum lowercase + dual-side / description-free apply variant / cross-backend apply contract / audit trail / idempotent replay) §6 sediment cite update: Lesson #13 v3.1 -> v2.3 (per huangheng line 285 verify); 「即将 fold」-> 「已 fold per PR #1932 commit dc79aad」(已 merged); 加 Lesson #18 候选 cross-link (lesson sediment + mechanical gate 双 layer codification - 一记一 enforce, per huangheng msg=b18d26ee + chenyexuan PR #1933 first-application demo) 新增 §3.1.3 entity_type scope lock (per PM msg=05be0b52 + ziang msg=d6d9dc3c + dongdong msg=83783bc6 + Weston msg=78ab2267 三方 converge): - v1 仍以 entity name 为主 merge target，`entity_type` 仅 compatibility / penalty signal - merge suggestion 必须容忍 type 近似 (展示 observed_types + type_conflict + suggested_entity_type) - `entity_type_alias` suggestion kind 移 Phase B / P1 follow-up (#31-C3)，独立设计 store/API/migration/UI；v1 boundary test 钉「不写 suggestion_kind='entity_type_alias'」防 scope creep Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@bryce

…1931) task #31 spec v1 lock — graph 节点合并扫描 + 后台建议任务设计文档入仓。 ## 设计核心 - **scope reframe**: extract / fix / extend Wave 7 §K.12.4 全栈，不 build new - **独立 queue family** `q:graph_curation_run`：lane 不污染 Modality + DocumentIndex + reconciler，独立 push/pop API - **trigger 三策略 reconcile**: manual/cron full sweep 走 worker pop → generate_graph_curation_run_task；auto_post_ingest 保 sync inline detect_for_sync 但同 description-free invariant - **复用 GraphCurationSuggestion table**：不引入新 merge_suggestion table，仅 extend 4 新 status enum + evidence_refs field - **状态机 Option B (apply_pending + ACCEPTED legacy)**: pending → dismissed | rejected | apply_pending → applying → applied | apply_failed；现有 ACCEPTED 历史 sync handle_action terminal status 保留 legacy read-only，新 async path zero-write gate - **description-free 6 call sites + 1 apply path** (Wave 5 invariant): candidate_generation.py:43/179-181/196-197 + dto.py:59-65/101-105 + merge_candidate_detector.py:257-284 + :322-328 + lineage_merge.py:246-317 apply variant - **LineageEntityMerger application-layer cross-backend contract** (Protocol 不含 merge_entities，复用 LineageGraphStore primitives) - **entity_type scope lock 三层**: v1 仅 compatibility/penalty signal，suggestion 容忍 type 近似展示 observed_types/type_conflict/suggested_entity_type，entity_type_alias 独立 suggestion kind 移 Phase B/P1 follow-up #31-C3 - **复用 /graphs/merge-suggestions endpoint + extend SUGGESTION_ACTIONS dismiss + Pydantic Field validator confidence_score [0,1]** ## 集体 8/8 lane LGTM 收齐 - @bryce (msg=9e49d440): 5 BLOCKER 全清 + entity_type scope lock + Migration chain 一致性 - @weston (msg=ed202960 + 92dd89ff): 五类 consistency sweep + entity_type 三层架构 + Migration chain - @huangzhangshu (msg=9a4cbd61 + 68783841): 五类旧口径清成 Phase A/B gate + enum count micro-fix - @ziang (msg=760b7341 + 0b761117): impl-lane 5 BLOCKER + state machine Option B + enum count - @huangheng (msg=535de81b): Lesson framework v5/v6/v7/v8/v9/v13/v14/v16/v17 + Lesson #18 候选 cross-link + Migration chain 时序全一致 - @dongdong (msg=8316b45a): FE/UI scoped + entity_type FE 友好性 + state machine - @Planetegg (msg=7d428e33): SRE/deploy Helm render gate symbolic lane assertion - @cuiwenbo (msg=594fbd4f): 3 NIT (endpoint reuse + status enum FE typed schema sync + confidence_score [0,1] validator) 全 fold ## CI 状态 - lint-and-unit ✅ - e2e-http-smoke 3/3 ✅ - e2e-http-provider-preflight 3/3 ✅ - docs-only lite gate 满足 ## 关联 - 不阻塞 PR #1932 (huangheng sediment merged dc79aad) / PR #1933 (chenyexuan merged 1024ef9) / task #61 P1/P2 follow-up / task #11 GC orphan vector follow-up - Phase A 4 sub-task 派单 spec lock 后立即可启动 (推荐 owner: A1+A3 Bryce/ziang / A2 ziang / A4 dongdong+cuiwenbo) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

@huangheng

…1941) * refactor(task-31-a3): description-free graph_curation 7 call sites Wave 5 description-NULL invariant (task #31 spec § 3.1.5): graph extractor stopped emitting `description` text post Wave 5 task #5 (facts/vectors split). The dedup detection / scoring / snapshot / accept-apply paths still read `entity.description` / `compacted_description` / `description_parts` and would either silently degrade scoring (always-empty bag-of-tokens) or leak stale fragments from pre-Wave-5 rows into reviewer-facing suggestions. Fix the 6 detector / snapshot call sites + 1 apply path enumerated in the spec, plus 1 service-layer helper surfaced by the boundary test grep gate: 1. candidate_generation.py:38 entity_snapshot — drop description 2. candidate_generation.py:179 _lexical_signals — drop description Jaccard token overlap 3. candidate_generation.py:196 _pair_score — drop description scoring weight (signal no longer emitted; branch is dead) 4. dto.py CurationEntity.from_lineage — set description="" instead of deriving from compacted / description_parts; keep field on the dataclass for back-compat with callers that still pass it 5. merge_candidate_detector._description_text_for_scoring → _embedding_query_text — embed `<name> (<entity_type>)` (mirror of how the graph_vectors worker writes the entity vector, Wave 5 task #5 / #7); the legacy method always short-circuited to "" post Wave 5 so detection produced zero candidates 6. merge_candidate_detector._to_legacy_entity — pass description="" instead of reading from entity 7. merge_candidate_detector._snapshot — drop description key from persisted entity_snapshots payload +1 lineage_merge.py — add merge_entities_apply_description_free variant for the async accept-apply worker (task #31 § 3.1.5). Skips LLM unified description / Compactor pass / __curation_merge__ sentinel description write / vector embed write per the spec «不调» list. Legacy merge_entities path is preserved for manual sync API back-compat (Lesson #14 multi-iteration cleanup follow-up). +1 service._fetch_shadow_neighbors — replace `entity.description or entity.name` with `entity.name`; post Wave 5 the description is always "" so the fallback was a no-op, and reading description here violates the boundary gate. Boundary gate (tests/boundaries/test_graph_curation_description_free.py, 4 AST-level assertions per spec § 5.2.a): - graph_curation_modules_do_not_read_entity_description - merge_candidate_detector_does_not_read_entity_description - lineage_merge_apply_description_free_does_not_read_entity_description - lineage_merge_apply_description_free_does_not_call_llm_or_compactor Allowlist: - lineage_merge.merge_entities (legacy back-compat) excluded by file - dto.py field declaration excluded (annotation, not a read) - LineageMergeResult.compacted_description (non-entity result shape used by legacy sync handle_action API) excluded by base name Wave-5 invariant codify pattern (Lesson #18 candidate, per huangheng PR #1932 + chenyexuan PR #1933 first-application demo): lesson sediment (cr-checklist § 四 Wave 5 description-NULL family) + mechanical gate (this boundary test) — paired so future regressions fail at CI not at review time. Tests: 1466 unit + 104 boundary all green. Risk: 0 production behavior change for legacy sync handle_action API (merge_entities preserved); new accept-apply async path uses the description-free variant exclusively. Spec: docs/zh-CN/architecture/task-31-graph-node-merge-spec-v1.md § 3.1.5 Task: task #77 (Phase A3) under task #31 umbrella Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(task-31-a3): fold huangheng cr-checklist Lesson #14/#18 NITs Per @huangheng cr-checklist Lesson #14 + #18 候选 cross-link verify (msg=be330423) — 2 non-blocker NITs on PR #1941 fix-forwarded: NIT 1 (service.py:244 deprecation marker): Add deprecation comment on the legacy sync ``handle_action()`` API return-shape line that reads ``merge_result.compacted_description``. Aligns with Lesson #14 «老 path 保留 + 标 deprecation» pattern (matches the ``lineage_merge.merge_entities`` deprecation marker added by the main commit), and explicitly cross-links the boundary test allowlist mechanism (``NON_ENTITY_BASE_NAMES``) so future grep-based audits don't dispatch on the read. NIT 2 (boundary test docstring bonus catch cross-link): Add explicit Lesson #18 候选 second-application demo trail in ``tests/boundaries/test_graph_curation_description_free.py`` module docstring — cite the ``service.py:845`` bonus catch (``text = entity.description or entity.name`` inside ``GraphCurationService._fetch_shadow_neighbors``) as canonical proof of the «lesson sediment + mechanical gate 双 layer codification» value. The spec § 3.1.5 ratify (符炫炜 + Bryce + ziang + huangzhangshu + Weston multi-source review) listed exactly 6+1 sites and every reviewer + spec author missed this 7th hidden read; the boundary gate caught it on first run, turning ``reviewer-as-detector`` into ``CI-as-detector`` per the Lesson #18 thesis. 0 production code change beyond comment / docstring text. Tests: 4/4 boundary test pass + ruff format / check clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(boundary): include dto.py in description-free AST scan Per @huangzhangshu BLOCKER (PR #1941 testing-lane CR, msg=2deb5407) + @ziang second-source ratify (msg=f485803c) + @不穷 PM dispatch (msg=a6cd42c9): the boundary gate ``test_graph_curation_modules_do_not_read_entity_description`` was whole-file excluding ``aperag/graph_curation/dto.py`` to avoid flagging the dataclass field declaration. But spec § 3.1.5 item 4 explicitly lists ``CurationEntity.from_lineage`` as one of the 6 description-free call sites, so the gate must catch future regressions that re-introduce ``entity.compacted_description`` / ``entity.description_parts`` reads inside ``from_lineage``. The whole-file exclusion was a false-positive prevention that turned out to be unnecessary: the AST walker matches ``ast.Attribute`` reads only, and dataclass field annotations (``description: str = ""``) are ``ast.AnnAssign`` nodes with ``target=ast.Name``, while constructor keyword args (``cls(description="")``) are ``ast.keyword`` nodes — neither is an ``ast.Attribute`` access on an entity object. Drops the whole-file exclusion and adds two reinforcing sister-tests so future maintainers do not regress this: * ``test_dto_module_is_in_boundary_scope`` — synthetic-AST positive control: feeds a fake ``from_lineage`` body that reads ``entity.compacted_description`` through the same offender detector and asserts the offender is surfaced. If a future refactor breaks the AST walker, this test catches the silent protection-loss. * ``test_dto_field_declaration_is_not_a_false_positive`` — live negative control: confirms the production ``dto.py`` produces zero offenders, with a docstring directing future maintainers to fix the walker (NOT re-allowlist the file) if a false- positive is ever observed. 6/6 boundary tests pass + ruff format / check clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

#1943) * docs(cr-checklist): task #31 Phase A 全闭环后 sediment fold-in 子 PR 2 § 四加 6 lesson sediment（task #31 Phase A 4 PR + task #33 P3 PR #1933 codify 累计实证 + multi-PR same-hour multi-source first-principles catch trust-framing miss）+ § 六 sediment 引用追加 5 PR commit cross-link + § 八修订记录追加本 PR fold-in 完整 trail。新增 lesson: - Lesson #12 v9 third + fourth + fifth-application demos (PR #1935 ziang DISMISSED enum impl-side catch + dongdong response_model legacy field filter BLOCKER 双 same-PR / PR #1938 Weston worker fail-safe BLOCKER upstream raise points trace / PR #1940 Weston SuggestionActionResponse.message required field catch) — sediment 升级 systemic 信号 reviewer chain 必独立 first- principles re-verify - Migration chain 时序 second-application demo (PR #1935 复用 table extend pattern 跟 PR #1910 新建 enum hard-cut migration 时序约束不同; 5 new enum value APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED/DISMISSED + evidence_refs JSON column + ACCEPTED legacy zero-write grep gate) - Lesson #17 second-application demo (PR #1935 backend 收敛 canonical contract 时同 PR fold-in legacy projection layer 保 backward-compat - suggestion_ batch_id=run_id alias 等 - 跟 deprecation marker Lesson #14 family 配) - Lesson #18 formally established: lesson sediment + mechanical gate 双 layer codification 「一记一 enforce」(first-app PR #1933 4-source default value parity / second-app PR #1941 description-free read scope + service.py:845 bonus catch / third-app PR #1941 fix-forward sister tests 防 whole-file exclude 静默削弱 gate) - mini-pattern 19: spec lock pre-check grep main 实证 enum/contract assumption (architect own-up 升级版三层: spec→impl / impl→response_model / impl catch path→upstream raise points) - mini-pattern 20: PR adds response_model wire-up 必跑 model_validate(actual_ handler_return_shape) boundary gate (PR #1940 first-application demo) per architect dispatch msg=b6726ac9 + msg=420ca548 sediment trigger A 满足 (task #31 Phase A 4/4 done) 启动 + Phase B B1 lane huangheng owner. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(cr-checklist): fix cite accuracy NIT per Weston msg=7690b723 2 cite accuracy fixes (Weston framing CR catch): 1. response_model validation failure 状态码: 422 -> 500 - response_model validation fails 抛 FastAPI ResponseValidationError - 通常映射到 HTTP 500，不是 request body 校验的 422 - 影响 line 745 + line 850 描述 PR #1940 BLOCKER 时的状态码引用 2. GraphMergeSuggestionItem canonical schema 字段实证修正 - 原写: ... / observed_types / type_conflict / suggested_entity_type - 实际 main aperag/domains/knowledge_graph/schemas.py::GraphMergeSuggestionItem 不含这三字段 - A4 (PR #1940) 这些字段是 FE-derived display (FE 从 entities / suggested_target_entity / evidence_refs 推导)，不是 PR #1935 backend projection - 影响 line 781 sect 4 Lesson #17 second-application demo 描述 per Weston PR #1943 framing CR (msg=7690b723) - sediment cite accuracy 要求把事实漂移修干净，避免 future onboarding reference 时 confuse 422/500 状态码语义 + backend/FE field source attribution。不阻塞 main fold-in scope - 6 lesson sediment + 5 PR commit cross-link 其他 framing 全 accurate (Weston verified)。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ty matrix (#1949) * feat(collection): task #61 P1-D3 — vector backend identity + capability matrix Project the deployment-wide ``settings.vector_db_type`` onto every collection detail read so the FE can render a "what does this vector backend actually support" panel without per-collection migration or runtime probe. Backend (output-only projection): - ``aperag/schema/common.py``: ``VectorBackendCapabilities`` + ``VectorBackendInfo`` + ``_STATIC_VECTOR_BACKEND_CAPABILITIES`` dict + ``project_vector_backend_info()`` helper. - ``aperag/domains/knowledge_base/schemas.py:Collection``: add ``vector_backend: Optional[VectorBackendInfo]``. **Intentionally NOT on ``CollectionConfig``** so the OpenAPI ``CollectionCreate`` / ``CollectionUpdate`` input shapes do not let callers mistake a deployment-wide setting for a per-collection editable knob (per dongdong msg=c2593fdd + PM msg=caf7e4df + architect msg=0044261f read-only projection lock). - ``aperag/domains/knowledge_base/service/collection_service.py``: populate ``vector_backend`` in ``build_collection_response`` from ``settings.vector_db_type``; ``None`` for unknown backends so the FE can render a placeholder without a hard failure. Cross-PR consistency with task #83 / PR #1948 (Bryce, vector adapter behavior fixes): - Bryce's connector-layer ``BACKEND_CAPABILITIES`` ClassVar declares 2 truth flags (``supports_atomic_batch_upsert`` + ``supports_legacy_mode``); this PR's schema-layer Pydantic model mirrors those values plus a 3rd schema-layer-only flag ``supports_filter_or_with_empty_parts`` which is uniformly False across adapters after task #83 P1-V3 (translator-level defense-in-depth rejects empty Or parts). - The 3rd flag stays in the schema so the FE can declare the uniform reject explicitly per spec § 2.3 P1-D3 「显示『允许差异但显式』」 — Lesson #17 backend 收敛 contract simple-stable family pattern (cite PR #1930 SearchHit normalize, PR #1935 GraphMergeSuggestionItem projection layer). Mechanical gate (per Lesson #18 lesson-sediment + mechanical-gate 双 layer codification — first established by chenyexuan PR #1933 / PR #1941, then PR #1940 ``model_validate`` boundary): 13-case unit suite in ``tests/unit_test/contracts/test_vector_backend_capability_matrix.py`` pins each capability flag, normalizes inputs, and round-trips Pydantic ``model_dump`` so future drift between schema, projection helper, and FE-consumed shape fails fast at unit-test time. FE (read-only display): - ``web/src/features/collection/types.ts``: typed mirrors ``VectorBackendInfo`` / ``VectorBackendCapabilities`` / ``VectorBackendType``. - ``web/src/app/workspace/collections/[collectionId]/settings/collection-vector-backend-card.tsx``: new component that surfaces backend identity + capability matrix in the collection settings page (above the edit form). dongdong picks up rendering polish (responsive + dark mode + final copy) on the same PR per the joint A4-style split (cuiwenbo contract layer + dongdong rendering polish + CR pair). - ``web/src/i18n/{en-US,zh-CN}/page_collections.json``: copy strings. - ``web/src/api-v2/schema.d.ts`` regenerated via ``yarn api:v2:types``. Local verification: - ``uv run --extra test pytest tests/unit_test/contracts/test_vector_backend_capability_matrix.py tests/unit_test/contracts/test_collection_v2_openapi_contract.py -q`` → 23 passed - ``make openapi-check`` → ok - ``yarn type-check --pretty false`` → 0 new errors on this PR's files (pre-existing graph-lab cosmograph + agent-runtime errors unchanged) - ``yarn lint --quiet`` → 0 warnings/errors - ``yarn i18n:check`` → ok - ``git diff --check`` → ok Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(collection): task #87 P1-D3 — convert vector_backend to computed_field Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): consolidate vector_backend_capability_matrix imports for ruff Combine the two from aperag.schema.common import ... statements into a single block so ruff's import organization rule is satisfied. No code-behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): apply ruff format to vector_backend test + common.py Run `uv run ruff format` on ApeRAG/aperag/schema/common.py and ApeRAG/tests/unit_test/contracts/test_vector_backend_capability_matrix.py so `make lint` (`ruff format --check`) passes. Pure formatting; no behavior change. Other unrelated files reverted to keep this PR scope clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

earayu merged commit 1024ef9 into main Apr 30, 2026
12 of 13 checks passed

earayu deleted the chenyexuan/task-33-p3-window-size-default-gate branch April 30, 2026 06:33

earayu mentioned this pull request Apr 30, 2026

refactor(task-31-a3): description-free graph_curation 7 call sites #1941

Merged

7 tasks

earayu mentioned this pull request Apr 30, 2026

docs(cr-checklist): task #31 Phase A close-out sediment fold-in 子 PR 2 #1943

Merged

3 tasks

earayu mentioned this pull request Apr 30, 2026

feat(collection): task #61 P1-D3 — vector backend identity + capability matrix #1949

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(task-33-p3): cross-source default value gate for graph_extraction_window_size#1933

test(task-33-p3): cross-source default value gate for graph_extraction_window_size#1933
earayu merged 1 commit into
mainfrom
chenyexuan/task-33-p3-window-size-default-gate

earayu commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 30, 2026

Summary

Why this gate exists (concrete drift evidence)

Sediment cross-link

Sources locked

Tests added (5)

Failure semantics

Why a unit test, not a boundary test or new workflow

Scope discipline (per earayu2 sticky 4)

Test plan

CR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant