feat(indexing): task #31 Phase A1 — independent graph_curation_run worker lane#1938
Conversation
|
Testing CR on PR #1938: one A1 boundary test gap to fix before I can LGTM. The queue family / worker-loop tests are strong, and the source grep gate correctly prevents API-side
Result: 21 passed. BLOCKER — Spec §5.2.a requires the API path to become a thin enqueue into
Without this behavioral test, a future change could keep the grep gate green while sending the wrong payload shape ( Suggested minimal fix: add focused unit tests around |
|
Testing CR on PR #1938: one A1 boundary test gap to fix before I can LGTM. The queue family / worker-loop tests are strong, and the source grep gate correctly prevents API-side
Result: 21 passed. BLOCKER — Spec §5.2.a requires the API path to become a thin enqueue into
Without this behavioral test, a future change could keep the grep gate green while sending the wrong payload shape ( Suggested minimal fix: add focused unit tests around |
#1938 CR gap) huangzhangshu testing CR (msg=fe66bd72) caught: PR #1938 had source- level grep gates on the API enqueue path but no behaviour tests. The pre-A1 ``test_start_run_marks_failed_when_enqueue_raises`` was deleted in Wave 3 T3.1 chunk 3 because the ``asyncio.create_task`` schedule path could not raise. Phase A1 reintroduces a real failure path (the ``await runtime.queue.push_graph_curation_run(...)`` enqueue), so the behaviour gate needs to come back. Five new ``pytest.mark.asyncio`` cases covering the ``GraphCurationService.start_run`` post-transaction enqueue branch: * ``test_start_run_enqueues_canonical_payload_when_created`` — pin the ``{run_id, collection_id}`` payload shape that the worker's :class:`GraphCurationRunDispatchPayload.from_dict` reads. Both fields must be ``str`` so ``from_dict`` normalisation is a no-op. * ``test_start_run_does_not_enqueue_when_run_already_active`` — ``created=False`` (existing PENDING/RUNNING) MUST NOT re-enqueue. Without this, every duplicate API call would multiply Redis payloads and waste worker LLM quota. * ``test_start_run_marks_run_failed_and_raises_when_enqueue_raises`` — Redis push failure surfaces as ``RuntimeError`` raised, with the run row marked FAILED carrying the original exception in its reason. Silent success would leave the row in PENDING forever. * ``test_start_run_marks_run_failed_and_raises_when_runtime_not_installed`` — fail-loud guard for test environments / pre-startup boot (``runtime is None``). * ``test_start_run_marks_run_failed_when_runtime_has_no_queue`` — symmetric: runtime present but ``queue=None`` (INLINE-mode test runtime). The ``_FakeQueue`` stub captures pushed payloads and toggles ``raise_on_push`` for the failure path; the heavy collaborators (``_get_and_validate_collection``, ``execute_with_transaction``, ``_mark_run_failed``, ``_run_to_dict``) are stubbed at the instance level so the test exercises only the post-transaction enqueue branch — matches the existing test-style in this file. Local: ``uv run pytest tests/unit_test/graph_curation/test_service.py`` → **8 passed**. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Fix-forward The added
Local targeted verification passed:
Result: 29 passed. No remaining A1 testing blocker from me. CI green => mergeable from my lane. |
… (Weston PR #1938 BLOCKER) Weston PR #1938 architecture CR (msg=04c9e5ee BLOCKER) caught a real correctness bug in the post-A1 worker catch path: the comment claimed "task-level failure already persisted in PG" but at least three pre-``generate_run`` raise sites bypass the service-layer ``_mark_run_failed`` and leave the run row in ``PENDING``: * ``aperag/graph_curation/integration.py:35-37`` — collection not found. * ``aperag/graph_curation/integration.py:49-61`` — backend / vector / embedder resolution failure. * ``aperag/domains/knowledge_graph/tasks.py:17-26`` — log + re-raise without marking FAILED. Failure chain without the fix: 1. Worker pops the payload (Redis side already consumed). 2. ``generate_graph_curation_run_task`` raises before ``service.generate_run`` runs, so no ``_mark_run_failed`` runs. 3. Run row stays ``PENDING``; queue payload is gone. 4. Next ``start_run`` call sees an "active" PENDING run and returns ``created=False`` without re-enqueueing — the collection's manual full sweep is permanently wedged. Fix --- ``_mark_run_failed_best_effort(engine, run_id, error_message)``: ``UPDATE graph_curation_runs SET status='FAILED', error_message=... WHERE id=:run_id AND status IN ('PENDING', 'RUNNING')``. The WHERE clause keeps this update idempotent w.r.t. ``generate_run`` having already written FAILED inside its own try/except — only stuck-in- transit rows get rewritten, finalised rows are preserved. The fail-safe is itself wrapped in ``try/except`` so a brief PG outage does not propagate up and halt the worker loop — the loop must keep popping subsequent payloads regardless. Error message is truncated to 1024 chars to stay polite to the ``Text`` column; the full traceback is in the worker log via ``logger.exception``. The ``_process_one_run`` catch path now logs + invokes the fail-safe under ``asyncio.to_thread`` (sync DB I/O) before swallowing the exception. ``engine`` is checked for ``None`` so the pure-unit tests that don't pass an engine keep working. Tests ----- * ``test_worker_loop_swallows_task_exception_and_marks_failed`` — upgraded the existing swallow test: after a raise the loop still processes the next payload AND ``_mark_run_failed_best_effort`` is invoked exactly once (only for the raising run) carrying the exception type + message in the reason. * ``test_mark_run_failed_best_effort_only_updates_pending_or_running`` — direct unit on the fail-safe SQL, asserts the ``status IN ('PENDING', 'RUNNING')`` predicate is present and the error message gets truncated to ≤ 1024 chars. * ``test_mark_run_failed_best_effort_swallows_db_errors`` — a ``begin()`` that raises ``OSError`` MUST NOT propagate out of the fail-safe; the worker loop must keep going. Local: ``uv run pytest tests/unit_test/indexing/test_graph_curation_run_orchestrator.py tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/graph_curation/test_service.py`` → **31 passed**. Spec amend candidate -------------------- Per architect msg=7af40610: spec v1.1 amend should add an explicit "worker catch fail-safe" invariant to § 3.1.1 + § 5.2.b boundary test gate, so the obligation is documented at the spec level rather than discovered at impl-time again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rker lane Closes task #75 per PM @不穷 dispatch (msg=4068e5e2). Implements spec ``task-31-graph-node-merge-spec-v1.md`` § 3.1.1 + § 3.1.1.b + § 5.2.a: extract the graph node merge suggestion full-sweep run from the API-process ``asyncio.create_task(asyncio.to_thread(...))`` fire-and- forget into a dedicated worker lane on the indexing-worker process, with state isolation from the existing per-:class:`Modality` queue family. Why a dedicated lane (per ziang msg=92321bcc + Bryce msg=4c23f87e BLOCKER 1) ----------------------------------------------------------------- ``WorkQueue.push/pop`` is keyed by :class:`Modality` (Redis key ``q:indexing:<modality>``) and the per-modality entrypoint machinery is bound to :class:`ModalityWorkerFactory` + :class:`DocumentIndex` payloads. ``GraphCurationRun`` is a per-collection / per-run job, **not** a per-document modality state — strapping it onto the modality keyed queue would pollute ``DocumentIndex`` / reconciler / index_state semantics. So this PR builds a parallel queue family (``q:graph_curation_run``) and a parallel run loop on top of the shared :class:`RedisWorkQueue` connection / quota / metrics infrastructure but with full state isolation. Trigger reconcile (per spec § 3.1.1.b) -------------------------------------- * **manual / cron** — ``GraphCurationService.start_run`` (API process) creates the ``GraphCurationRun`` row and enqueues the ``run_id`` onto ``q:graph_curation_run``. The worker pop calls ``generate_graph_curation_run_task`` integration path. * **auto_post_ingest** — existing ``MergeCandidateDetector.detect_for_sync`` end-of-sync inline ``GraphModalityWorker.sync`` path, intentionally NOT routed through this worker. That stays as a write-only quick path and is description-free fixed by task #77 A3 (chenyexuan). Changes ------- * ``aperag/indexing/orchestrator.py``: extend ``WorkQueue`` Protocol with ``push_graph_curation_run`` / ``pop_graph_curation_run``; implement on :class:`InMemoryWorkQueue` and :class:`RedisWorkQueue`. New ``GRAPH_CURATION_RUN_KEY = "q:graph_curation_run"`` constant — distinct from the ``q:indexing:<modality>`` template, so a Redis ``KEYS`` audit can't confuse the two families. * ``aperag/indexing/graph_curation_run_orchestrator.py`` (new): :func:`run_graph_curation_run_worker` async loop mirroring the ``run_parse_worker_loop`` shape — pop, decode via :class:`GraphCurationRunDispatchPayload`, dispatch ``generate_graph_curation_run_task`` on a worker thread (so the asyncio loop stays free), drain in-flight on shutdown. * ``aperag/indexing/__init__.py``: re-export the new helpers. * ``aperag/cli/indexing_worker.py``: add the new lane to the startup task list (independent ``asyncio.create_task``, NOT through ``_entrypoint(Modality, ...)``); update startup log to list 11 tasks. The lane is identified by symbolic name ``graph_curation_run`` — the boundary test asserts presence by name, never by count, so future lane add/remove doesn't drift the gate. * ``aperag/graph_curation/service.py``: replace ``asyncio.create_task(asyncio.to_thread(generate_graph_curation_run_task, ...))`` at ``service.py:114-123`` with a thin ``runtime.queue.push_graph_curation_run(payload)`` enqueue. Fail- loud if no runtime / queue is installed (rather than silently leaving the run PENDING forever) — matches the existing ``_mark_run_failed`` discipline. Tests ----- * ``tests/unit_test/test_app_lifespan_no_workers.py``: extend the positive contract list with ``run_graph_curation_run_worker``; add three task-#31-named tests pinning the dual-side gate per spec § 5.2.a: - positive: ``test_cli_worker_starts_graph_curation_run_lane`` - negative: ``test_graph_curation_service_does_not_execute_run_inline`` (greps ``service.py`` for any ``generate_graph_curation_run_task(`` call site on executable lines — comments / docstrings are permitted to describe the historical pattern) - positive: ``test_graph_curation_service_uses_push_graph_curation_run`` * ``tests/unit_test/indexing/test_graph_curation_run_orchestrator.py`` (new, 17 cases): payload roundtrip + key normalisation; in-memory queue independence (push to graph_curation_run does not leak into any modality queue and vice versa); Redis key constant distinctness; worker loop dispatch + malformed-payload drop + task-exception swallow + shutdown drain. Local: ``uv run pytest tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/indexing/ tests/unit_test/graph_curation/ tests/unit_test/vectorstore/`` → **544 passed, 10 skipped, 2 warnings**. Spec / scope alignment ---------------------- * task #31 spec v1 § 3.1.1 independent queue family ``q:graph_curation_run`` ✅ * task #31 spec v1 § 3.1.1.b trigger split (manual/cron worker pop vs auto_post_ingest sync inline write-only) ✅ * task #31 spec v1 § 5.2.a lane symbolic dual-side gate (positive lane name appears in indexing-worker; negative API process must not invoke ``generate_graph_curation_run_task`` directly) ✅ * Lesson #11 v5 entry-point migration cross-process parity (lane added to ``cli/indexing_worker.py`` startup; negative gate on ``app.py``) ✅ * Lesson #14 multi-iteration cleanup (the new symbolic lane assertion replaces the brittle "11th lane" count which several reviewers flagged in PR #1931 fix-forward 4-6) ✅ Follow-ups (NOT in this PR) --------------------------- * task #77 A3 description-free 6+1 call site refactor — chenyexuan * task #78 A4 FE / endpoint reuse + 7-state UI — dongdong + cuiwenbo * huangheng follow-up sub-PR queue (sediment fold-in cycle) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
#1938 CR gap) huangzhangshu testing CR (msg=fe66bd72) caught: PR #1938 had source- level grep gates on the API enqueue path but no behaviour tests. The pre-A1 ``test_start_run_marks_failed_when_enqueue_raises`` was deleted in Wave 3 T3.1 chunk 3 because the ``asyncio.create_task`` schedule path could not raise. Phase A1 reintroduces a real failure path (the ``await runtime.queue.push_graph_curation_run(...)`` enqueue), so the behaviour gate needs to come back. Five new ``pytest.mark.asyncio`` cases covering the ``GraphCurationService.start_run`` post-transaction enqueue branch: * ``test_start_run_enqueues_canonical_payload_when_created`` — pin the ``{run_id, collection_id}`` payload shape that the worker's :class:`GraphCurationRunDispatchPayload.from_dict` reads. Both fields must be ``str`` so ``from_dict`` normalisation is a no-op. * ``test_start_run_does_not_enqueue_when_run_already_active`` — ``created=False`` (existing PENDING/RUNNING) MUST NOT re-enqueue. Without this, every duplicate API call would multiply Redis payloads and waste worker LLM quota. * ``test_start_run_marks_run_failed_and_raises_when_enqueue_raises`` — Redis push failure surfaces as ``RuntimeError`` raised, with the run row marked FAILED carrying the original exception in its reason. Silent success would leave the row in PENDING forever. * ``test_start_run_marks_run_failed_and_raises_when_runtime_not_installed`` — fail-loud guard for test environments / pre-startup boot (``runtime is None``). * ``test_start_run_marks_run_failed_when_runtime_has_no_queue`` — symmetric: runtime present but ``queue=None`` (INLINE-mode test runtime). The ``_FakeQueue`` stub captures pushed payloads and toggles ``raise_on_push`` for the failure path; the heavy collaborators (``_get_and_validate_collection``, ``execute_with_transaction``, ``_mark_run_failed``, ``_run_to_dict``) are stubbed at the instance level so the test exercises only the post-transaction enqueue branch — matches the existing test-style in this file. Local: ``uv run pytest tests/unit_test/graph_curation/test_service.py`` → **8 passed**. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… (Weston PR #1938 BLOCKER) Weston PR #1938 architecture CR (msg=04c9e5ee BLOCKER) caught a real correctness bug in the post-A1 worker catch path: the comment claimed "task-level failure already persisted in PG" but at least three pre-``generate_run`` raise sites bypass the service-layer ``_mark_run_failed`` and leave the run row in ``PENDING``: * ``aperag/graph_curation/integration.py:35-37`` — collection not found. * ``aperag/graph_curation/integration.py:49-61`` — backend / vector / embedder resolution failure. * ``aperag/domains/knowledge_graph/tasks.py:17-26`` — log + re-raise without marking FAILED. Failure chain without the fix: 1. Worker pops the payload (Redis side already consumed). 2. ``generate_graph_curation_run_task`` raises before ``service.generate_run`` runs, so no ``_mark_run_failed`` runs. 3. Run row stays ``PENDING``; queue payload is gone. 4. Next ``start_run`` call sees an "active" PENDING run and returns ``created=False`` without re-enqueueing — the collection's manual full sweep is permanently wedged. Fix --- ``_mark_run_failed_best_effort(engine, run_id, error_message)``: ``UPDATE graph_curation_runs SET status='FAILED', error_message=... WHERE id=:run_id AND status IN ('PENDING', 'RUNNING')``. The WHERE clause keeps this update idempotent w.r.t. ``generate_run`` having already written FAILED inside its own try/except — only stuck-in- transit rows get rewritten, finalised rows are preserved. The fail-safe is itself wrapped in ``try/except`` so a brief PG outage does not propagate up and halt the worker loop — the loop must keep popping subsequent payloads regardless. Error message is truncated to 1024 chars to stay polite to the ``Text`` column; the full traceback is in the worker log via ``logger.exception``. The ``_process_one_run`` catch path now logs + invokes the fail-safe under ``asyncio.to_thread`` (sync DB I/O) before swallowing the exception. ``engine`` is checked for ``None`` so the pure-unit tests that don't pass an engine keep working. Tests ----- * ``test_worker_loop_swallows_task_exception_and_marks_failed`` — upgraded the existing swallow test: after a raise the loop still processes the next payload AND ``_mark_run_failed_best_effort`` is invoked exactly once (only for the raising run) carrying the exception type + message in the reason. * ``test_mark_run_failed_best_effort_only_updates_pending_or_running`` — direct unit on the fail-safe SQL, asserts the ``status IN ('PENDING', 'RUNNING')`` predicate is present and the error message gets truncated to ≤ 1024 chars. * ``test_mark_run_failed_best_effort_swallows_db_errors`` — a ``begin()`` that raises ``OSError`` MUST NOT propagate out of the fail-safe; the worker loop must keep going. Local: ``uv run pytest tests/unit_test/indexing/test_graph_curation_run_orchestrator.py tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/graph_curation/test_service.py`` → **31 passed**. Spec amend candidate -------------------- Per architect msg=7af40610: spec v1.1 amend should add an explicit "worker catch fail-safe" invariant to § 3.1.1 + § 5.2.b boundary test gate, so the obligation is documented at the spec level rather than discovered at impl-time again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
825b55d to
a0c7293
Compare
task #31 spec v1.1 amend — fold Phase A 4/4 done 实施 surface 的 spec drift + spec lock invariants + lesson sediment trail. ## v1.1 Amend Scope - § 3.1.1 worker loop fail-safe invariant (PR #1938 Weston BLOCKER → spec lock) - § 3.1.2 action API response shape model_validate contract (PR #1940 Weston BLOCKER → spec lock) - § 3.1.6 DISMISSED enum source 修正 (PR #1935 ziang grep main 实证 v1 spec drift fix) - § 5.2.b 新增 3 boundary test invariants - § 5.2.c 新增 Phase A 实施 sediment trail - § 6 cr-checklist 加 5 sediment items - Migration chain 时序: 5 new value (含 DISMISSED) 不是 4 - fix-forward 1 (commit d50864f) — 全文 6 处 4→5 enum count global sweep + § 1.1 line 17 pre-A2 实证口径补齐 (per Weston BLOCKER msg=2ad46e97) ## CR - @符炫炜 architect (own draft) - @huangheng cr-checklist sediment cite verify ✅ msg=b276da50 - @weston framing verify ✅ msg=a111fcc3 (re-final pass post fix-forward 1) ## CI - lint-and-unit ✅ - e2e-http-smoke 3/3 ✅ (auto-merge after green) - provider-preflight 3/3 ✅ - docs-only lite gate satisfied 🤖 Architect ratify by Claude Code
#1943) * docs(cr-checklist): task #31 Phase A 全闭环后 sediment fold-in 子 PR 2 § 四 加 6 lesson sediment(task #31 Phase A 4 PR + task #33 P3 PR #1933 codify 累计实证 + multi-PR same-hour multi-source first-principles catch trust-framing miss)+ § 六 sediment 引用追加 5 PR commit cross-link + § 八 修订记录追加本 PR fold-in 完整 trail。 新增 lesson: - Lesson #12 v9 third + fourth + fifth-application demos (PR #1935 ziang DISMISSED enum impl-side catch + dongdong response_model legacy field filter BLOCKER 双 same-PR / PR #1938 Weston worker fail-safe BLOCKER upstream raise points trace / PR #1940 Weston SuggestionActionResponse.message required field catch) — sediment 升级 systemic 信号 reviewer chain 必独立 first- principles re-verify - Migration chain 时序 second-application demo (PR #1935 复用 table extend pattern 跟 PR #1910 新建 enum hard-cut migration 时序约束不同; 5 new enum value APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED/DISMISSED + evidence_refs JSON column + ACCEPTED legacy zero-write grep gate) - Lesson #17 second-application demo (PR #1935 backend 收敛 canonical contract 时同 PR fold-in legacy projection layer 保 backward-compat - suggestion_ batch_id=run_id alias 等 - 跟 deprecation marker Lesson #14 family 配) - Lesson #18 formally established: lesson sediment + mechanical gate 双 layer codification 「一记一 enforce」(first-app PR #1933 4-source default value parity / second-app PR #1941 description-free read scope + service.py:845 bonus catch / third-app PR #1941 fix-forward sister tests 防 whole-file exclude 静默削弱 gate) - mini-pattern 19: spec lock pre-check grep main 实证 enum/contract assumption (architect own-up 升级版三层: spec→impl / impl→response_model / impl catch path→upstream raise points) - mini-pattern 20: PR adds response_model wire-up 必跑 model_validate(actual_ handler_return_shape) boundary gate (PR #1940 first-application demo) per architect dispatch msg=b6726ac9 + msg=420ca548 sediment trigger A 满足 (task #31 Phase A 4/4 done) 启动 + Phase B B1 lane huangheng owner. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(cr-checklist): fix cite accuracy NIT per Weston msg=7690b723 2 cite accuracy fixes (Weston framing CR catch): 1. response_model validation failure 状态码: 422 -> 500 - response_model validation fails 抛 FastAPI ResponseValidationError - 通常映射到 HTTP 500,不是 request body 校验的 422 - 影响 line 745 + line 850 描述 PR #1940 BLOCKER 时的状态码引用 2. GraphMergeSuggestionItem canonical schema 字段实证修正 - 原写: ... / observed_types / type_conflict / suggested_entity_type - 实际 main aperag/domains/knowledge_graph/schemas.py::GraphMergeSuggestionItem 不含这三字段 - A4 (PR #1940) 这些字段是 FE-derived display (FE 从 entities / suggested_target_entity / evidence_refs 推导),不是 PR #1935 backend projection - 影响 line 781 sect 4 Lesson #17 second-application demo 描述 per Weston PR #1943 framing CR (msg=7690b723) - sediment cite accuracy 要求把事实漂移修干净,避免 future onboarding reference 时 confuse 422/500 状态码语义 + backend/FE field source attribution。 不阻塞 main fold-in scope - 6 lesson sediment + 5 PR commit cross-link 其他 framing 全 accurate (Weston verified)。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…field Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ty matrix (#1949) * feat(collection): task #61 P1-D3 — vector backend identity + capability matrix Project the deployment-wide ``settings.vector_db_type`` onto every collection detail read so the FE can render a "what does this vector backend actually support" panel without per-collection migration or runtime probe. Backend (output-only projection): - ``aperag/schema/common.py``: ``VectorBackendCapabilities`` + ``VectorBackendInfo`` + ``_STATIC_VECTOR_BACKEND_CAPABILITIES`` dict + ``project_vector_backend_info()`` helper. - ``aperag/domains/knowledge_base/schemas.py:Collection``: add ``vector_backend: Optional[VectorBackendInfo]``. **Intentionally NOT on ``CollectionConfig``** so the OpenAPI ``CollectionCreate`` / ``CollectionUpdate`` input shapes do not let callers mistake a deployment-wide setting for a per-collection editable knob (per dongdong msg=c2593fdd + PM msg=caf7e4df + architect msg=0044261f read-only projection lock). - ``aperag/domains/knowledge_base/service/collection_service.py``: populate ``vector_backend`` in ``build_collection_response`` from ``settings.vector_db_type``; ``None`` for unknown backends so the FE can render a placeholder without a hard failure. Cross-PR consistency with task #83 / PR #1948 (Bryce, vector adapter behavior fixes): - Bryce's connector-layer ``BACKEND_CAPABILITIES`` ClassVar declares 2 truth flags (``supports_atomic_batch_upsert`` + ``supports_legacy_mode``); this PR's schema-layer Pydantic model mirrors those values plus a 3rd schema-layer-only flag ``supports_filter_or_with_empty_parts`` which is uniformly False across adapters after task #83 P1-V3 (translator-level defense-in-depth rejects empty Or parts). - The 3rd flag stays in the schema so the FE can declare the uniform reject explicitly per spec § 2.3 P1-D3 「显示『允许差异但显式』」 — Lesson #17 backend 收敛 contract simple-stable family pattern (cite PR #1930 SearchHit normalize, PR #1935 GraphMergeSuggestionItem projection layer). Mechanical gate (per Lesson #18 lesson-sediment + mechanical-gate 双 layer codification — first established by chenyexuan PR #1933 / PR #1941, then PR #1940 ``model_validate`` boundary): 13-case unit suite in ``tests/unit_test/contracts/test_vector_backend_capability_matrix.py`` pins each capability flag, normalizes inputs, and round-trips Pydantic ``model_dump`` so future drift between schema, projection helper, and FE-consumed shape fails fast at unit-test time. FE (read-only display): - ``web/src/features/collection/types.ts``: typed mirrors ``VectorBackendInfo`` / ``VectorBackendCapabilities`` / ``VectorBackendType``. - ``web/src/app/workspace/collections/[collectionId]/settings/collection-vector-backend-card.tsx``: new component that surfaces backend identity + capability matrix in the collection settings page (above the edit form). dongdong picks up rendering polish (responsive + dark mode + final copy) on the same PR per the joint A4-style split (cuiwenbo contract layer + dongdong rendering polish + CR pair). - ``web/src/i18n/{en-US,zh-CN}/page_collections.json``: copy strings. - ``web/src/api-v2/schema.d.ts`` regenerated via ``yarn api:v2:types``. Local verification: - ``uv run --extra test pytest tests/unit_test/contracts/test_vector_backend_capability_matrix.py tests/unit_test/contracts/test_collection_v2_openapi_contract.py -q`` → 23 passed - ``make openapi-check`` → ok - ``yarn type-check --pretty false`` → 0 new errors on this PR's files (pre-existing graph-lab cosmograph + agent-runtime errors unchanged) - ``yarn lint --quiet`` → 0 warnings/errors - ``yarn i18n:check`` → ok - ``git diff --check`` → ok Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(collection): task #87 P1-D3 — convert vector_backend to computed_field Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): consolidate vector_backend_capability_matrix imports for ruff Combine the two from aperag.schema.common import ... statements into a single block so ruff's import organization rule is satisfied. No code-behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): apply ruff format to vector_backend test + common.py Run `uv run ruff format` on ApeRAG/aperag/schema/common.py and ApeRAG/tests/unit_test/contracts/test_vector_backend_capability_matrix.py so `make lint` (`ruff format --check`) passes. Pure formatting; no behavior change. Other unrelated files reverted to keep this PR scope clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Closes task #75 per PM @不穷 dispatch (msg=4068e5e2). Implements task #31 spec v1 § 3.1.1 + § 3.1.1.b + § 5.2.a: extract the graph node merge suggestion full-sweep run from API-process `asyncio.create_task(asyncio.to_thread(...))` into a dedicated worker lane on the indexing-worker process, with full state isolation from the per-Modality queue family.
Changes
Trigger reconcile (per spec § 3.1.1.b)
Tests
Local: `uv run pytest tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/indexing/ tests/unit_test/graph_curation/ tests/unit_test/vectorstore/` → 544 passed, 10 skipped.
Spec / scope alignment
Test plan
Not blocking
🤖 Generated with Claude Code