test(task-33-p3): cross-source default value gate for graph_extraction_window_size#1933
Merged
Merged
Conversation
…n_window_size Codify Lesson #13 v3 (cross-source default value alignment) as a CI unit test gate so future task-#30 B3-class drift is caught by ``cicd-push.yml`` lint+unit instead of by reviewers via fix-forward rounds. Background — task #30 B3 (PR #1925, merge ``43648f9``) locked ``graph_extraction_window_size`` default to ``2`` across **four** sources that all need to agree: 1. ``aperag/indexing/graph_extractor.py`` ``_DEFAULT_GRAPH_EXTRACTION_WINDOW_SIZE`` (Python const, runtime fallback) 2. ``aperag/schema/common.py`` ``KnowledgeGraphConfig.graph_extraction_window_size`` Pydantic ``Field(examples=[N])`` (OpenAPI / TS schema source) 3. ``web/src/api-v2/schema.d.ts`` JSDoc ``@example N`` (frontend client surface — committed to repo, can drift if regen skipped) 4. ``docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md`` § 3.1.1 line 85 ``**B3 lock default `N`**`` + § 4.2 ``**`graph_extraction_window_size = N`**`` (architectural source of truth that PRs CR against) PR #1925 itself surfaced the drift class: - Weston ``msg=1b7d9bef`` BLOCKER 1 caught ``schema.d.ts`` still carrying default ``1`` - huangheng ``msg=bf785b12`` NIT 1 caught § 3.1.1 line 85 still saying default ``1`` Both required a fix-forward commit (``dae43f5``). Why a unit test (not a boundary test): ``tests/boundaries/`` is not currently invoked by ``make test-unit`` / ``test-integration`` / ``cicd-push.yml`` (task #33 Layer 1 audit finding). ``tests/unit_test/`` runs on every push via ``make test-unit``. Per simple-stable directive (earayu2 ``msg=1224bec8``), the cheapest reliable gate is a unit test in the existing CI lane, not a new workflow file. Scope discipline: pins **default value parity** across four sources only. Does not pin description text, override-recommendation phrasing, or rationale wording. If a future change moves the default away from 2, the test fails with a list of all observed values per source plus the procedural reminder (``≥10 samples + ≥3 models 同时不退步 + PM + architect + earayu2 三方 confirm``). Tests: - ``test_graph_extraction_window_size_default_consistent_across_sources`` — the main gate (asserts all 4 sources agree) - ``test_graph_extraction_window_size_default_is_positive_integer`` — sanity (window assembler math requires ``>= 1``) - ``test_individual_source_extractor_does_not_raise[*]`` — separates "extractor broken" failures from "values drifted" failures so operator immediately knows whether to fix test infra or schema Local validation: - 5/5 pass in clean state - Synthetic drift on each of (Python const / TS schema / spec § 3.1.1 / spec § 4.2) caught with clear actionable error message naming the drifting source - Full ``tests/unit_test/contracts/`` 58/58 pass - ruff format + ruff check clean Sediment cross-link: this gate is the codified counterpart to huangheng PR #1932 § 四 Lesson #13 v3 application demo 2 + Lesson #14 application demo (PR #1925 § 3.1.1 multi-iteration cleanup) — that PR records the drift class as a CR-checklist lesson; this PR enforces it mechanically so the lesson does not have to be remembered. task #33 Layer 2 P3 (chenyexuan claim, in_progress) per PM dispatch ``msg=65465f9e``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
earayu
added a commit
that referenced
this pull request
Apr 30, 2026
…scope lock 5 BLOCKER cleanup (per ziang/Weston/Bryce/huangzhangshu/Planetegg/dongdong/huangheng converge): 1. lane name `graph_node_merge_suggestion` -> `graph_curation_run` 全文统一 (§2.1/§4 A1/§4 B3/§5.1/§5.2 + 测试文件名 `test_graph_curation_run_boundaries.py` + 删 11th 数字硬码 + 删 `_dedup_scan` 旧实现名) 2. 新 `merge_suggestion` table 残留全清 (§2.2 P1-B + §3.1.2 + §6 Migration chain) -> 全文统一「复用 GraphCurationSuggestion + extend status enum 4 新 value + 加 evidence_refs field」 3. §3.1.4 `LineageGraphStore.merge_entities` (不存在) -> `LineageEntityMerger` description-free apply path 基于 LineageGraphStore primitives + cross-backend boundary test 钉 LineageEntityMerger 行为 4. Phase A1 worker `MergeSuggestionWorker` 调 `MergeCandidateDetector` -> 区分 manual/cron full sweep 调 `generate_graph_curation_run_task` integration path vs auto_post_ingest sync inline `detect_for_sync()` quick path 5. description-free 4 处 -> 6 处 detector/snapshot call site enumerate (补 `merge_candidate_detector.py:322-328` + 修正 `:263-271` -> `:257-284`,跟 §3.1.5 align) + 1 个 apply path §5 gate 拆 5.2.a scan-generation invariants (lane symbolic dual-side / independent queue family / description-free 6 call sites / trigger split / safe-only write) vs 5.2.b async accept-apply 状态机 invariants (7-state machine / enum lowercase + dual-side / description-free apply variant / cross-backend apply contract / audit trail / idempotent replay) §6 sediment cite update: Lesson #13 v3.1 -> v2.3 (per huangheng line 285 verify); 「即将 fold」-> 「已 fold per PR #1932 commit dc79aad」(已 merged); 加 Lesson #18 候选 cross-link (lesson sediment + mechanical gate 双 layer codification - 一记一 enforce, per huangheng msg=b18d26ee + chenyexuan PR #1933 first-application demo) 新增 §3.1.3 entity_type scope lock (per PM msg=05be0b52 + ziang msg=d6d9dc3c + dongdong msg=83783bc6 + Weston msg=78ab2267 三方 converge): - v1 仍以 entity name 为主 merge target,`entity_type` 仅 compatibility / penalty signal - merge suggestion 必须容忍 type 近似 (展示 observed_types + type_conflict + suggested_entity_type) - `entity_type_alias` suggestion kind 移 Phase B / P1 follow-up (#31-C3),独立设计 store/API/migration/UI;v1 boundary test 钉「不写 suggestion_kind='entity_type_alias'」防 scope creep Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
earayu
added a commit
that referenced
this pull request
Apr 30, 2026
…1931) task #31 spec v1 lock — graph 节点合并扫描 + 后台建议任务设计文档入仓。 ## 设计核心 - **scope reframe**: extract / fix / extend Wave 7 §K.12.4 全栈,不 build new - **独立 queue family** `q:graph_curation_run`:lane 不污染 Modality + DocumentIndex + reconciler,独立 push/pop API - **trigger 三策略 reconcile**: manual/cron full sweep 走 worker pop → generate_graph_curation_run_task;auto_post_ingest 保 sync inline detect_for_sync 但同 description-free invariant - **复用 GraphCurationSuggestion table**:不引入新 merge_suggestion table,仅 extend 4 新 status enum + evidence_refs field - **状态机 Option B (apply_pending + ACCEPTED legacy)**: pending → dismissed | rejected | apply_pending → applying → applied | apply_failed;现有 ACCEPTED 历史 sync handle_action terminal status 保留 legacy read-only,新 async path zero-write gate - **description-free 6 call sites + 1 apply path** (Wave 5 invariant): candidate_generation.py:43/179-181/196-197 + dto.py:59-65/101-105 + merge_candidate_detector.py:257-284 + :322-328 + lineage_merge.py:246-317 apply variant - **LineageEntityMerger application-layer cross-backend contract** (Protocol 不含 merge_entities,复用 LineageGraphStore primitives) - **entity_type scope lock 三层**: v1 仅 compatibility/penalty signal,suggestion 容忍 type 近似展示 observed_types/type_conflict/suggested_entity_type,entity_type_alias 独立 suggestion kind 移 Phase B/P1 follow-up #31-C3 - **复用 /graphs/merge-suggestions endpoint + extend SUGGESTION_ACTIONS dismiss + Pydantic Field validator confidence_score [0,1]** ## 集体 8/8 lane LGTM 收齐 - @bryce (msg=9e49d440): 5 BLOCKER 全清 + entity_type scope lock + Migration chain 一致性 - @weston (msg=ed202960 + 92dd89ff): 五类 consistency sweep + entity_type 三层架构 + Migration chain - @huangzhangshu (msg=9a4cbd61 + 68783841): 五类旧口径清成 Phase A/B gate + enum count micro-fix - @ziang (msg=760b7341 + 0b761117): impl-lane 5 BLOCKER + state machine Option B + enum count - @huangheng (msg=535de81b): Lesson framework v5/v6/v7/v8/v9/v13/v14/v16/v17 + Lesson #18 候选 cross-link + Migration chain 时序 全一致 - @dongdong (msg=8316b45a): FE/UI scoped + entity_type FE 友好性 + state machine - @Planetegg (msg=7d428e33): SRE/deploy Helm render gate symbolic lane assertion - @cuiwenbo (msg=594fbd4f): 3 NIT (endpoint reuse + status enum FE typed schema sync + confidence_score [0,1] validator) 全 fold ## CI 状态 - lint-and-unit ✅ - e2e-http-smoke 3/3 ✅ - e2e-http-provider-preflight 3/3 ✅ - docs-only lite gate 满足 ## 关联 - 不阻塞 PR #1932 (huangheng sediment merged dc79aad) / PR #1933 (chenyexuan merged 1024ef9) / task #61 P1/P2 follow-up / task #11 GC orphan vector follow-up - Phase A 4 sub-task 派单 spec lock 后立即可启动 (推荐 owner: A1+A3 Bryce/ziang / A2 ziang / A4 dongdong+cuiwenbo) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
7 tasks
earayu
added a commit
that referenced
this pull request
Apr 30, 2026
…1941) * refactor(task-31-a3): description-free graph_curation 7 call sites Wave 5 description-NULL invariant (task #31 spec § 3.1.5): graph extractor stopped emitting `description` text post Wave 5 task #5 (facts/vectors split). The dedup detection / scoring / snapshot / accept-apply paths still read `entity.description` / `compacted_description` / `description_parts` and would either silently degrade scoring (always-empty bag-of-tokens) or leak stale fragments from pre-Wave-5 rows into reviewer-facing suggestions. Fix the 6 detector / snapshot call sites + 1 apply path enumerated in the spec, plus 1 service-layer helper surfaced by the boundary test grep gate: 1. candidate_generation.py:38 entity_snapshot — drop description 2. candidate_generation.py:179 _lexical_signals — drop description Jaccard token overlap 3. candidate_generation.py:196 _pair_score — drop description scoring weight (signal no longer emitted; branch is dead) 4. dto.py CurationEntity.from_lineage — set description="" instead of deriving from compacted / description_parts; keep field on the dataclass for back-compat with callers that still pass it 5. merge_candidate_detector._description_text_for_scoring → _embedding_query_text — embed `<name> (<entity_type>)` (mirror of how the graph_vectors worker writes the entity vector, Wave 5 task #5 / #7); the legacy method always short-circuited to "" post Wave 5 so detection produced zero candidates 6. merge_candidate_detector._to_legacy_entity — pass description="" instead of reading from entity 7. merge_candidate_detector._snapshot — drop description key from persisted entity_snapshots payload +1 lineage_merge.py — add merge_entities_apply_description_free variant for the async accept-apply worker (task #31 § 3.1.5). Skips LLM unified description / Compactor pass / __curation_merge__ sentinel description write / vector embed write per the spec «不调» list. Legacy merge_entities path is preserved for manual sync API back-compat (Lesson #14 multi-iteration cleanup follow-up). +1 service._fetch_shadow_neighbors — replace `entity.description or entity.name` with `entity.name`; post Wave 5 the description is always "" so the fallback was a no-op, and reading description here violates the boundary gate. Boundary gate (tests/boundaries/test_graph_curation_description_free.py, 4 AST-level assertions per spec § 5.2.a): - graph_curation_modules_do_not_read_entity_description - merge_candidate_detector_does_not_read_entity_description - lineage_merge_apply_description_free_does_not_read_entity_description - lineage_merge_apply_description_free_does_not_call_llm_or_compactor Allowlist: - lineage_merge.merge_entities (legacy back-compat) excluded by file - dto.py field declaration excluded (annotation, not a read) - LineageMergeResult.compacted_description (non-entity result shape used by legacy sync handle_action API) excluded by base name Wave-5 invariant codify pattern (Lesson #18 candidate, per huangheng PR #1932 + chenyexuan PR #1933 first-application demo): lesson sediment (cr-checklist § 四 Wave 5 description-NULL family) + mechanical gate (this boundary test) — paired so future regressions fail at CI not at review time. Tests: 1466 unit + 104 boundary all green. Risk: 0 production behavior change for legacy sync handle_action API (merge_entities preserved); new accept-apply async path uses the description-free variant exclusively. Spec: docs/zh-CN/architecture/task-31-graph-node-merge-spec-v1.md § 3.1.5 Task: task #77 (Phase A3) under task #31 umbrella Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(task-31-a3): fold huangheng cr-checklist Lesson #14/#18 NITs Per @huangheng cr-checklist Lesson #14 + #18 候选 cross-link verify (msg=be330423) — 2 non-blocker NITs on PR #1941 fix-forwarded: NIT 1 (service.py:244 deprecation marker): Add deprecation comment on the legacy sync ``handle_action()`` API return-shape line that reads ``merge_result.compacted_description``. Aligns with Lesson #14 «老 path 保留 + 标 deprecation» pattern (matches the ``lineage_merge.merge_entities`` deprecation marker added by the main commit), and explicitly cross-links the boundary test allowlist mechanism (``NON_ENTITY_BASE_NAMES``) so future grep-based audits don't dispatch on the read. NIT 2 (boundary test docstring bonus catch cross-link): Add explicit Lesson #18 候选 second-application demo trail in ``tests/boundaries/test_graph_curation_description_free.py`` module docstring — cite the ``service.py:845`` bonus catch (``text = entity.description or entity.name`` inside ``GraphCurationService._fetch_shadow_neighbors``) as canonical proof of the «lesson sediment + mechanical gate 双 layer codification» value. The spec § 3.1.5 ratify (符炫炜 + Bryce + ziang + huangzhangshu + Weston multi-source review) listed exactly 6+1 sites and every reviewer + spec author missed this 7th hidden read; the boundary gate caught it on first run, turning ``reviewer-as-detector`` into ``CI-as-detector`` per the Lesson #18 thesis. 0 production code change beyond comment / docstring text. Tests: 4/4 boundary test pass + ruff format / check clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(boundary): include dto.py in description-free AST scan Per @huangzhangshu BLOCKER (PR #1941 testing-lane CR, msg=2deb5407) + @ziang second-source ratify (msg=f485803c) + @不穷 PM dispatch (msg=a6cd42c9): the boundary gate ``test_graph_curation_modules_do_not_read_entity_description`` was whole-file excluding ``aperag/graph_curation/dto.py`` to avoid flagging the dataclass field declaration. But spec § 3.1.5 item 4 explicitly lists ``CurationEntity.from_lineage`` as one of the 6 description-free call sites, so the gate must catch future regressions that re-introduce ``entity.compacted_description`` / ``entity.description_parts`` reads inside ``from_lineage``. The whole-file exclusion was a false-positive prevention that turned out to be unnecessary: the AST walker matches ``ast.Attribute`` reads only, and dataclass field annotations (``description: str = ""``) are ``ast.AnnAssign`` nodes with ``target=ast.Name``, while constructor keyword args (``cls(description="")``) are ``ast.keyword`` nodes — neither is an ``ast.Attribute`` access on an entity object. Drops the whole-file exclusion and adds two reinforcing sister-tests so future maintainers do not regress this: * ``test_dto_module_is_in_boundary_scope`` — synthetic-AST positive control: feeds a fake ``from_lineage`` body that reads ``entity.compacted_description`` through the same offender detector and asserts the offender is surfaced. If a future refactor breaks the AST walker, this test catches the silent protection-loss. * ``test_dto_field_declaration_is_not_a_false_positive`` — live negative control: confirms the production ``dto.py`` produces zero offenders, with a docstring directing future maintainers to fix the walker (NOT re-allowlist the file) if a false- positive is ever observed. 6/6 boundary tests pass + ruff format / check clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
earayu
added a commit
that referenced
this pull request
Apr 30, 2026
#1943) * docs(cr-checklist): task #31 Phase A 全闭环后 sediment fold-in 子 PR 2 § 四 加 6 lesson sediment(task #31 Phase A 4 PR + task #33 P3 PR #1933 codify 累计实证 + multi-PR same-hour multi-source first-principles catch trust-framing miss)+ § 六 sediment 引用追加 5 PR commit cross-link + § 八 修订记录追加本 PR fold-in 完整 trail。 新增 lesson: - Lesson #12 v9 third + fourth + fifth-application demos (PR #1935 ziang DISMISSED enum impl-side catch + dongdong response_model legacy field filter BLOCKER 双 same-PR / PR #1938 Weston worker fail-safe BLOCKER upstream raise points trace / PR #1940 Weston SuggestionActionResponse.message required field catch) — sediment 升级 systemic 信号 reviewer chain 必独立 first- principles re-verify - Migration chain 时序 second-application demo (PR #1935 复用 table extend pattern 跟 PR #1910 新建 enum hard-cut migration 时序约束不同; 5 new enum value APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED/DISMISSED + evidence_refs JSON column + ACCEPTED legacy zero-write grep gate) - Lesson #17 second-application demo (PR #1935 backend 收敛 canonical contract 时同 PR fold-in legacy projection layer 保 backward-compat - suggestion_ batch_id=run_id alias 等 - 跟 deprecation marker Lesson #14 family 配) - Lesson #18 formally established: lesson sediment + mechanical gate 双 layer codification 「一记一 enforce」(first-app PR #1933 4-source default value parity / second-app PR #1941 description-free read scope + service.py:845 bonus catch / third-app PR #1941 fix-forward sister tests 防 whole-file exclude 静默削弱 gate) - mini-pattern 19: spec lock pre-check grep main 实证 enum/contract assumption (architect own-up 升级版三层: spec→impl / impl→response_model / impl catch path→upstream raise points) - mini-pattern 20: PR adds response_model wire-up 必跑 model_validate(actual_ handler_return_shape) boundary gate (PR #1940 first-application demo) per architect dispatch msg=b6726ac9 + msg=420ca548 sediment trigger A 满足 (task #31 Phase A 4/4 done) 启动 + Phase B B1 lane huangheng owner. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(cr-checklist): fix cite accuracy NIT per Weston msg=7690b723 2 cite accuracy fixes (Weston framing CR catch): 1. response_model validation failure 状态码: 422 -> 500 - response_model validation fails 抛 FastAPI ResponseValidationError - 通常映射到 HTTP 500,不是 request body 校验的 422 - 影响 line 745 + line 850 描述 PR #1940 BLOCKER 时的状态码引用 2. GraphMergeSuggestionItem canonical schema 字段实证修正 - 原写: ... / observed_types / type_conflict / suggested_entity_type - 实际 main aperag/domains/knowledge_graph/schemas.py::GraphMergeSuggestionItem 不含这三字段 - A4 (PR #1940) 这些字段是 FE-derived display (FE 从 entities / suggested_target_entity / evidence_refs 推导),不是 PR #1935 backend projection - 影响 line 781 sect 4 Lesson #17 second-application demo 描述 per Weston PR #1943 framing CR (msg=7690b723) - sediment cite accuracy 要求把事实漂移修干净,避免 future onboarding reference 时 confuse 422/500 状态码语义 + backend/FE field source attribution。 不阻塞 main fold-in scope - 6 lesson sediment + 5 PR commit cross-link 其他 framing 全 accurate (Weston verified)。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
10 tasks
earayu
added a commit
that referenced
this pull request
Apr 30, 2026
…ty matrix (#1949) * feat(collection): task #61 P1-D3 — vector backend identity + capability matrix Project the deployment-wide ``settings.vector_db_type`` onto every collection detail read so the FE can render a "what does this vector backend actually support" panel without per-collection migration or runtime probe. Backend (output-only projection): - ``aperag/schema/common.py``: ``VectorBackendCapabilities`` + ``VectorBackendInfo`` + ``_STATIC_VECTOR_BACKEND_CAPABILITIES`` dict + ``project_vector_backend_info()`` helper. - ``aperag/domains/knowledge_base/schemas.py:Collection``: add ``vector_backend: Optional[VectorBackendInfo]``. **Intentionally NOT on ``CollectionConfig``** so the OpenAPI ``CollectionCreate`` / ``CollectionUpdate`` input shapes do not let callers mistake a deployment-wide setting for a per-collection editable knob (per dongdong msg=c2593fdd + PM msg=caf7e4df + architect msg=0044261f read-only projection lock). - ``aperag/domains/knowledge_base/service/collection_service.py``: populate ``vector_backend`` in ``build_collection_response`` from ``settings.vector_db_type``; ``None`` for unknown backends so the FE can render a placeholder without a hard failure. Cross-PR consistency with task #83 / PR #1948 (Bryce, vector adapter behavior fixes): - Bryce's connector-layer ``BACKEND_CAPABILITIES`` ClassVar declares 2 truth flags (``supports_atomic_batch_upsert`` + ``supports_legacy_mode``); this PR's schema-layer Pydantic model mirrors those values plus a 3rd schema-layer-only flag ``supports_filter_or_with_empty_parts`` which is uniformly False across adapters after task #83 P1-V3 (translator-level defense-in-depth rejects empty Or parts). - The 3rd flag stays in the schema so the FE can declare the uniform reject explicitly per spec § 2.3 P1-D3 「显示『允许差异但显式』」 — Lesson #17 backend 收敛 contract simple-stable family pattern (cite PR #1930 SearchHit normalize, PR #1935 GraphMergeSuggestionItem projection layer). Mechanical gate (per Lesson #18 lesson-sediment + mechanical-gate 双 layer codification — first established by chenyexuan PR #1933 / PR #1941, then PR #1940 ``model_validate`` boundary): 13-case unit suite in ``tests/unit_test/contracts/test_vector_backend_capability_matrix.py`` pins each capability flag, normalizes inputs, and round-trips Pydantic ``model_dump`` so future drift between schema, projection helper, and FE-consumed shape fails fast at unit-test time. FE (read-only display): - ``web/src/features/collection/types.ts``: typed mirrors ``VectorBackendInfo`` / ``VectorBackendCapabilities`` / ``VectorBackendType``. - ``web/src/app/workspace/collections/[collectionId]/settings/collection-vector-backend-card.tsx``: new component that surfaces backend identity + capability matrix in the collection settings page (above the edit form). dongdong picks up rendering polish (responsive + dark mode + final copy) on the same PR per the joint A4-style split (cuiwenbo contract layer + dongdong rendering polish + CR pair). - ``web/src/i18n/{en-US,zh-CN}/page_collections.json``: copy strings. - ``web/src/api-v2/schema.d.ts`` regenerated via ``yarn api:v2:types``. Local verification: - ``uv run --extra test pytest tests/unit_test/contracts/test_vector_backend_capability_matrix.py tests/unit_test/contracts/test_collection_v2_openapi_contract.py -q`` → 23 passed - ``make openapi-check`` → ok - ``yarn type-check --pretty false`` → 0 new errors on this PR's files (pre-existing graph-lab cosmograph + agent-runtime errors unchanged) - ``yarn lint --quiet`` → 0 warnings/errors - ``yarn i18n:check`` → ok - ``git diff --check`` → ok Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(collection): task #87 P1-D3 — convert vector_backend to computed_field Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): consolidate vector_backend_capability_matrix imports for ruff Combine the two from aperag.schema.common import ... statements into a single block so ruff's import organization rule is satisfied. No code-behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): apply ruff format to vector_backend test + common.py Run `uv run ruff format` on ApeRAG/aperag/schema/common.py and ApeRAG/tests/unit_test/contracts/test_vector_backend_capability_matrix.py so `make lint` (`ruff format --check`) passes. Pure formatting; no behavior change. Other unrelated files reverted to keep this PR scope clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cicd-push.ymllint+unit gate forgraph_extraction_window_size@example/ spec doc § 3.1.1 + § 4.2Why this gate exists (concrete drift evidence)
PR #1925 (task #30 B3, default=2 lock) surfaced the cross-source drift class:
msg=1b7d9befBLOCKER 1 — caughtweb/src/api-v2/schema.d.tsstill carryingdefault 1after Python + Pydantic moved to 2msg=bf785b12NIT 1 — caught spec § 3.1.1 line 85 still sayingdefault 1dae43f5This gate replaces "reviewers act as drift detectors" with "lint+unit catches drift mechanically before review burden".
Sediment cross-link
This PR is the codified mechanical counterpart to PR #1932 (@huangheng) § 四 Lesson #13 v3 application demo 2 + Lesson #14 application demo. PR #1932 records the drift class as a CR-checklist lesson; this PR enforces it so the lesson does not have to be remembered.
Per
docs/zh-CN/architecture/task-17-cr-review-checklist.md§ 四 Lesson #13 v3.Sources locked
aperag/indexing/graph_extractor.py_DEFAULT_GRAPH_EXTRACTION_WINDOW_SIZE(Python const, runtime fallback)aperag/schema/common.pyKnowledgeGraphConfig.graph_extraction_window_sizeField(examples=[N])web/src/api-v2/schema.d.ts@example N(frontend client surface)docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md**B3 lock default \N`+ § 4.2`graph_extraction_window_size = N`**`Tests added (5)
test_graph_extraction_window_size_default_consistent_across_sources— main gate (4 sources must agree)test_graph_extraction_window_size_default_is_positive_integer— sanity (window math requires>= 1)3-5.
test_individual_source_extractor_does_not_raise[python_const|pydantic_examples|ts_schema_example]— separates "extractor broken" failures from "values drifted" failures so operator immediately knows whether to fix test infra or schemaFailure semantics
When the gate trips, it spells out all 4 observed values per source + the procedural reminder (
>= 10 samples + >= 3 models 同时不退步 + PM + architect + earayu2 三方 confirmper spec § 4.2). Operator does not have to guess which source drifted.Why a unit test, not a boundary test or new workflow
tests/boundaries/is not currently invoked bymake test-unit/test-integration/cicd-push.yml(task chore: refactor sources #33 Layer 1 audit finding — separate Layer 2 follow-up if needed)tests/unit_test/runs on every push viamake test-unitmsg=1224bec8), cheapest reliable gate = unit test in existing CI lane, not a new workflow filetests/unit_test/contracts/cross-source contract testsScope discipline (per earayu2 sticky 4)
This gate pins default value parity only. It does NOT pin:
If a future change moves the default away from 2, the test fails; fix is to update all 4 sources in the same PR.
Test plan
pytest tests/unit_test/contracts/test_graph_extraction_window_size_default_consistency.py -v-> 5/5 pass@example-> caughttests/unit_test/contracts/full suite -> 58/58 pass (no collateral break)ruff format --checkcleanruff checkcleanCR
@符炫炜 ratify (架构师 lane).
@huangheng / @weston / @冬柏 / @huangzhangshu / @Planetegg — 任意 reviewer LGTM 即可 author squash merge.
Does not block:
🤖 Generated with Claude Code