feat(indexing): task #31 Phase A1 — independent graph_curation_run worker lane by earayu · Pull Request #1938 · apecloud/ApeRAG

earayu · 2026-04-30T08:48:29Z

Summary

Closes task #75 per PM @不穷 dispatch (msg=4068e5e2). Implements task #31 spec v1 § 3.1.1 + § 3.1.1.b + § 5.2.a: extract the graph node merge suggestion full-sweep run from API-process `asyncio.create_task(asyncio.to_thread(...))` into a dedicated worker lane on the indexing-worker process, with full state isolation from the per-Modality queue family.

Changes

`aperag/indexing/orchestrator.py` — `WorkQueue` Protocol extended with `push_graph_curation_run` / `pop_graph_curation_run`; `GRAPH_CURATION_RUN_KEY = "q:graph_curation_run"` constant (distinct from `q:indexing:`); InMemory + Redis impl
`aperag/indexing/graph_curation_run_orchestrator.py` (new) — `run_graph_curation_run_worker` independent async loop, mirrors `run_parse_worker_loop` shape (pop, decode via `GraphCurationRunDispatchPayload`, dispatch `generate_graph_curation_run_task` on `asyncio.to_thread`, drain in-flight on shutdown)
`aperag/cli/indexing_worker.py` — 11th lane added (independent `asyncio.create_task`, NOT through `_entrypoint(Modality, ...)`), startup log lists 11 tasks
`aperag/graph_curation/service.py:114-123` — replace `asyncio.create_task(asyncio.to_thread(...))` with `runtime.queue.push_graph_curation_run(payload)`; fail-loud if runtime not installed (rather than silently PENDING forever)

Trigger reconcile (per spec § 3.1.1.b)

manual / cron — API `start_run` enqueues `run_id` onto `q:graph_curation_run`; worker pop calls `generate_graph_curation_run_task` integration path
auto_post_ingest — existing `MergeCandidateDetector.detect_for_sync` end-of-sync inline path stays as write-only quick path (NOT routed through this worker); description-free fixed by task feat: support default collections #77 A3 (chenyexuan)

Tests

`tests/unit_test/test_app_lifespan_no_workers.py` — extended positive contract + 3 new task-support pagination #31-named tests (cli worker starts `run_graph_curation_run_worker`; service.py does NOT invoke `generate_graph_curation_run_task(` on executable lines; service.py uses `push_graph_curation_run`)
`tests/unit_test/indexing/test_graph_curation_run_orchestrator.py` (new, 17 cases) — payload roundtrip; in-memory queue independence (`push_graph_curation_run` does not leak into Modality queues + vice versa); Redis key constant distinctness; worker loop dispatch + malformed-payload drop + task-exception swallow + shutdown drain

Local: `uv run pytest tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/indexing/ tests/unit_test/graph_curation/ tests/unit_test/vectorstore/` → 544 passed, 10 skipped.

Spec / scope alignment

✅ task support pagination #31 spec v1 § 3.1.1 — independent queue family
✅ task support pagination #31 spec v1 § 3.1.1.b — trigger split (manual/cron worker pop vs auto_post_ingest sync inline)
✅ task support pagination #31 spec v1 § 5.2.a — lane symbolic dual-side gate (positive name appears in indexing-worker; negative API process must not invoke `generate_graph_curation_run_task` directly)
✅ Lesson fix: timestamp change #11 v5 entry-point migration cross-process parity
✅ Lesson feat: socket debug #14 multi-iteration cleanup — symbolic lane name (not "11th") replaces brittle count assertion several reviewers flagged in PR docs(task-31): graph node merge & background suggestion task spec v1 #1931 fix-forward 4-6

Test plan

Local lint + format clean (ruff check + format)
Local unit tests pass (544 passed across boundary + indexing + graph_curation + vectorstore)
CI lint-and-unit + e2e green
@符炫炜 ratify (per spec 4 boundary CR + 9-pattern matrix + enum/contract grep main hard gate)
@weston architecture CR per spec 4 boundary
@huangzhangshu testing CR per spec § 5.2 acceptance gate

Not blocking

task feat: support default collections #77 A3 description-free refactor — chenyexuan parallel
task chore: fix embedding timeout and scale the embedding workers #78 A4 FE / endpoint reuse — dongdong + cuiwenbo (depends on A2 typed schema, A2 already merged `6fc6f64f`)
huangheng follow-up sub-PR sediment fold-in queue (trigger A: task support pagination #31 Phase A 全 done)

🤖 Generated with Claude Code

earayu · 2026-04-30T08:50:48Z

Testing CR on PR #1938: one A1 boundary test gap to fix before I can LGTM.

The queue family / worker-loop tests are strong, and the source grep gate correctly prevents API-side generate_graph_curation_run_task(...) execution. I also ran the targeted suite locally:

uv run --extra test pytest tests/unit_test/indexing/test_graph_curation_run_orchestrator.py tests/unit_test/test_app_lifespan_no_workers.py -q

Result: 21 passed.

BLOCKER — GraphCurationService.start_run() enqueue path is only grep-tested, not behavior-tested

Spec §5.2.a requires the API path to become a thin enqueue into q:graph_curation_run. The current tests assert source text contains push_graph_curation_run( and does not call generate_graph_curation_run_task(, but they do not prove the actual runtime behavior:

created run enqueues payload with exact keys {"run_id": str(run.id), "collection_id": str(collection_id)}
existing/in-progress run (created=False) does not enqueue a duplicate payload
enqueue failure marks the run failed and raises instead of returning a started response that will never execute

Without this behavioral test, a future change could keep the grep gate green while sending the wrong payload shape (run vs run_id, missing collection_id) or enqueueing on duplicate start. The worker would then drop malformed payloads and manual runs would sit failed/stranded even though A1 tests pass.

Suggested minimal fix: add focused unit tests around GraphCurationService.start_run() using a stub runtime queue and monkeypatching execute_with_transaction() / _mark_run_failed(); no DB fixture needed. Keep source grep tests as the dual-side boundary, but add the service-level behavioral contract for payload + duplicate/no-op + enqueue-failure path.

earayu · 2026-04-30T08:54:24Z

Testing CR on PR #1938: one A1 boundary test gap to fix before I can LGTM.

The queue family / worker-loop tests are strong, and the source grep gate correctly prevents API-side generate_graph_curation_run_task(...) execution. I also ran the targeted suite locally:

uv run --extra test pytest tests/unit_test/indexing/test_graph_curation_run_orchestrator.py tests/unit_test/test_app_lifespan_no_workers.py -q

Result: 21 passed.

BLOCKER — GraphCurationService.start_run() enqueue path is only grep-tested, not behavior-tested

Spec §5.2.a requires the API path to become a thin enqueue into q:graph_curation_run. The current tests assert source text contains push_graph_curation_run( and does not call generate_graph_curation_run_task(, but they do not prove the actual runtime behavior:

created run enqueues payload with exact keys {"run_id": str(run.id), "collection_id": str(collection_id)}
existing/in-progress run (created=False) does not enqueue a duplicate payload
enqueue failure marks the run failed and raises instead of returning a started response that will never execute

Without this behavioral test, a future change could keep the grep gate green while sending the wrong payload shape (run vs run_id, missing collection_id) or enqueueing on duplicate start. The worker would then drop malformed payloads and manual runs would be stranded even though A1 tests pass.

Suggested minimal fix: add focused unit tests around GraphCurationService.start_run() using a stub runtime queue and monkeypatching execute_with_transaction() / _mark_run_failed(); no DB fixture needed. Keep source grep tests as the dual-side boundary, but add the service-level behavioral contract for payload + duplicate/no-op + enqueue-failure path.

#1938 CR gap) huangzhangshu testing CR (msg=fe66bd72) caught: PR #1938 had source- level grep gates on the API enqueue path but no behaviour tests. The pre-A1 ``test_start_run_marks_failed_when_enqueue_raises`` was deleted in Wave 3 T3.1 chunk 3 because the ``asyncio.create_task`` schedule path could not raise. Phase A1 reintroduces a real failure path (the ``await runtime.queue.push_graph_curation_run(...)`` enqueue), so the behaviour gate needs to come back. Five new ``pytest.mark.asyncio`` cases covering the ``GraphCurationService.start_run`` post-transaction enqueue branch: * ``test_start_run_enqueues_canonical_payload_when_created`` — pin the ``{run_id, collection_id}`` payload shape that the worker's :class:`GraphCurationRunDispatchPayload.from_dict` reads. Both fields must be ``str`` so ``from_dict`` normalisation is a no-op. * ``test_start_run_does_not_enqueue_when_run_already_active`` — ``created=False`` (existing PENDING/RUNNING) MUST NOT re-enqueue. Without this, every duplicate API call would multiply Redis payloads and waste worker LLM quota. * ``test_start_run_marks_run_failed_and_raises_when_enqueue_raises`` — Redis push failure surfaces as ``RuntimeError`` raised, with the run row marked FAILED carrying the original exception in its reason. Silent success would leave the row in PENDING forever. * ``test_start_run_marks_run_failed_and_raises_when_runtime_not_installed`` — fail-loud guard for test environments / pre-startup boot (``runtime is None``). * ``test_start_run_marks_run_failed_when_runtime_has_no_queue`` — symmetric: runtime present but ``queue=None`` (INLINE-mode test runtime). The ``_FakeQueue`` stub captures pushed payloads and toggles ``raise_on_push`` for the failure path; the heavy collaborators (``_get_and_validate_collection``, ``execute_with_transaction``, ``_mark_run_failed``, ``_run_to_dict``) are stubbed at the instance level so the test exercises only the post-transaction enqueue branch — matches the existing test-style in this file. Local: ``uv run pytest tests/unit_test/graph_curation/test_service.py`` → **8 passed**. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

earayu · 2026-04-30T08:58:21Z

Fix-forward e3405a13 recheck: LGTM from testing lane.

The added GraphCurationService.start_run() behavior tests cover the gap I flagged:

created run enqueues canonical {"run_id": str, "collection_id": str} payload for GraphCurationRunDispatchPayload.from_dict
existing active run (created=False) does not enqueue a duplicate
enqueue failure marks the run failed and raises
runtime missing / runtime with queue=None fail loud with failed-run marking

Local targeted verification passed:

uv run --extra test pytest tests/unit_test/graph_curation/test_service.py tests/unit_test/indexing/test_graph_curation_run_orchestrator.py tests/unit_test/test_app_lifespan_no_workers.py -q

Result: 29 passed.

No remaining A1 testing blocker from me. CI green => mergeable from my lane.

… (Weston PR #1938 BLOCKER) Weston PR #1938 architecture CR (msg=04c9e5ee BLOCKER) caught a real correctness bug in the post-A1 worker catch path: the comment claimed "task-level failure already persisted in PG" but at least three pre-``generate_run`` raise sites bypass the service-layer ``_mark_run_failed`` and leave the run row in ``PENDING``: * ``aperag/graph_curation/integration.py:35-37`` — collection not found. * ``aperag/graph_curation/integration.py:49-61`` — backend / vector / embedder resolution failure. * ``aperag/domains/knowledge_graph/tasks.py:17-26`` — log + re-raise without marking FAILED. Failure chain without the fix: 1. Worker pops the payload (Redis side already consumed). 2. ``generate_graph_curation_run_task`` raises before ``service.generate_run`` runs, so no ``_mark_run_failed`` runs. 3. Run row stays ``PENDING``; queue payload is gone. 4. Next ``start_run`` call sees an "active" PENDING run and returns ``created=False`` without re-enqueueing — the collection's manual full sweep is permanently wedged. Fix --- ``_mark_run_failed_best_effort(engine, run_id, error_message)``: ``UPDATE graph_curation_runs SET status='FAILED', error_message=... WHERE id=:run_id AND status IN ('PENDING', 'RUNNING')``. The WHERE clause keeps this update idempotent w.r.t. ``generate_run`` having already written FAILED inside its own try/except — only stuck-in- transit rows get rewritten, finalised rows are preserved. The fail-safe is itself wrapped in ``try/except`` so a brief PG outage does not propagate up and halt the worker loop — the loop must keep popping subsequent payloads regardless. Error message is truncated to 1024 chars to stay polite to the ``Text`` column; the full traceback is in the worker log via ``logger.exception``. The ``_process_one_run`` catch path now logs + invokes the fail-safe under ``asyncio.to_thread`` (sync DB I/O) before swallowing the exception. ``engine`` is checked for ``None`` so the pure-unit tests that don't pass an engine keep working. Tests ----- * ``test_worker_loop_swallows_task_exception_and_marks_failed`` — upgraded the existing swallow test: after a raise the loop still processes the next payload AND ``_mark_run_failed_best_effort`` is invoked exactly once (only for the raising run) carrying the exception type + message in the reason. * ``test_mark_run_failed_best_effort_only_updates_pending_or_running`` — direct unit on the fail-safe SQL, asserts the ``status IN ('PENDING', 'RUNNING')`` predicate is present and the error message gets truncated to ≤ 1024 chars. * ``test_mark_run_failed_best_effort_swallows_db_errors`` — a ``begin()`` that raises ``OSError`` MUST NOT propagate out of the fail-safe; the worker loop must keep going. Local: ``uv run pytest tests/unit_test/indexing/test_graph_curation_run_orchestrator.py tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/graph_curation/test_service.py`` → **31 passed**. Spec amend candidate -------------------- Per architect msg=7af40610: spec v1.1 amend should add an explicit "worker catch fail-safe" invariant to § 3.1.1 + § 5.2.b boundary test gate, so the obligation is documented at the spec level rather than discovered at impl-time again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…rker lane Closes task #75 per PM @不穷 dispatch (msg=4068e5e2). Implements spec ``task-31-graph-node-merge-spec-v1.md`` § 3.1.1 + § 3.1.1.b + § 5.2.a: extract the graph node merge suggestion full-sweep run from the API-process ``asyncio.create_task(asyncio.to_thread(...))`` fire-and- forget into a dedicated worker lane on the indexing-worker process, with state isolation from the existing per-:class:`Modality` queue family. Why a dedicated lane (per ziang msg=92321bcc + Bryce msg=4c23f87e BLOCKER 1) ----------------------------------------------------------------- ``WorkQueue.push/pop`` is keyed by :class:`Modality` (Redis key ``q:indexing:<modality>``) and the per-modality entrypoint machinery is bound to :class:`ModalityWorkerFactory` + :class:`DocumentIndex` payloads. ``GraphCurationRun`` is a per-collection / per-run job, **not** a per-document modality state — strapping it onto the modality keyed queue would pollute ``DocumentIndex`` / reconciler / index_state semantics. So this PR builds a parallel queue family (``q:graph_curation_run``) and a parallel run loop on top of the shared :class:`RedisWorkQueue` connection / quota / metrics infrastructure but with full state isolation. Trigger reconcile (per spec § 3.1.1.b) -------------------------------------- * **manual / cron** — ``GraphCurationService.start_run`` (API process) creates the ``GraphCurationRun`` row and enqueues the ``run_id`` onto ``q:graph_curation_run``. The worker pop calls ``generate_graph_curation_run_task`` integration path. * **auto_post_ingest** — existing ``MergeCandidateDetector.detect_for_sync`` end-of-sync inline ``GraphModalityWorker.sync`` path, intentionally NOT routed through this worker. That stays as a write-only quick path and is description-free fixed by task #77 A3 (chenyexuan). Changes ------- * ``aperag/indexing/orchestrator.py``: extend ``WorkQueue`` Protocol with ``push_graph_curation_run`` / ``pop_graph_curation_run``; implement on :class:`InMemoryWorkQueue` and :class:`RedisWorkQueue`. New ``GRAPH_CURATION_RUN_KEY = "q:graph_curation_run"`` constant — distinct from the ``q:indexing:<modality>`` template, so a Redis ``KEYS`` audit can't confuse the two families. * ``aperag/indexing/graph_curation_run_orchestrator.py`` (new): :func:`run_graph_curation_run_worker` async loop mirroring the ``run_parse_worker_loop`` shape — pop, decode via :class:`GraphCurationRunDispatchPayload`, dispatch ``generate_graph_curation_run_task`` on a worker thread (so the asyncio loop stays free), drain in-flight on shutdown. * ``aperag/indexing/__init__.py``: re-export the new helpers. * ``aperag/cli/indexing_worker.py``: add the new lane to the startup task list (independent ``asyncio.create_task``, NOT through ``_entrypoint(Modality, ...)``); update startup log to list 11 tasks. The lane is identified by symbolic name ``graph_curation_run`` — the boundary test asserts presence by name, never by count, so future lane add/remove doesn't drift the gate. * ``aperag/graph_curation/service.py``: replace ``asyncio.create_task(asyncio.to_thread(generate_graph_curation_run_task, ...))`` at ``service.py:114-123`` with a thin ``runtime.queue.push_graph_curation_run(payload)`` enqueue. Fail- loud if no runtime / queue is installed (rather than silently leaving the run PENDING forever) — matches the existing ``_mark_run_failed`` discipline. Tests ----- * ``tests/unit_test/test_app_lifespan_no_workers.py``: extend the positive contract list with ``run_graph_curation_run_worker``; add three task-#31-named tests pinning the dual-side gate per spec § 5.2.a: - positive: ``test_cli_worker_starts_graph_curation_run_lane`` - negative: ``test_graph_curation_service_does_not_execute_run_inline`` (greps ``service.py`` for any ``generate_graph_curation_run_task(`` call site on executable lines — comments / docstrings are permitted to describe the historical pattern) - positive: ``test_graph_curation_service_uses_push_graph_curation_run`` * ``tests/unit_test/indexing/test_graph_curation_run_orchestrator.py`` (new, 17 cases): payload roundtrip + key normalisation; in-memory queue independence (push to graph_curation_run does not leak into any modality queue and vice versa); Redis key constant distinctness; worker loop dispatch + malformed-payload drop + task-exception swallow + shutdown drain. Local: ``uv run pytest tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/indexing/ tests/unit_test/graph_curation/ tests/unit_test/vectorstore/`` → **544 passed, 10 skipped, 2 warnings**. Spec / scope alignment ---------------------- * task #31 spec v1 § 3.1.1 independent queue family ``q:graph_curation_run`` ✅ * task #31 spec v1 § 3.1.1.b trigger split (manual/cron worker pop vs auto_post_ingest sync inline write-only) ✅ * task #31 spec v1 § 5.2.a lane symbolic dual-side gate (positive lane name appears in indexing-worker; negative API process must not invoke ``generate_graph_curation_run_task`` directly) ✅ * Lesson #11 v5 entry-point migration cross-process parity (lane added to ``cli/indexing_worker.py`` startup; negative gate on ``app.py``) ✅ * Lesson #14 multi-iteration cleanup (the new symbolic lane assertion replaces the brittle "11th lane" count which several reviewers flagged in PR #1931 fix-forward 4-6) ✅ Follow-ups (NOT in this PR) --------------------------- * task #77 A3 description-free 6+1 call site refactor — chenyexuan * task #78 A4 FE / endpoint reuse + 7-state UI — dongdong + cuiwenbo * huangheng follow-up sub-PR queue (sediment fold-in cycle) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#1938 CR gap) huangzhangshu testing CR (msg=fe66bd72) caught: PR #1938 had source- level grep gates on the API enqueue path but no behaviour tests. The pre-A1 ``test_start_run_marks_failed_when_enqueue_raises`` was deleted in Wave 3 T3.1 chunk 3 because the ``asyncio.create_task`` schedule path could not raise. Phase A1 reintroduces a real failure path (the ``await runtime.queue.push_graph_curation_run(...)`` enqueue), so the behaviour gate needs to come back. Five new ``pytest.mark.asyncio`` cases covering the ``GraphCurationService.start_run`` post-transaction enqueue branch: * ``test_start_run_enqueues_canonical_payload_when_created`` — pin the ``{run_id, collection_id}`` payload shape that the worker's :class:`GraphCurationRunDispatchPayload.from_dict` reads. Both fields must be ``str`` so ``from_dict`` normalisation is a no-op. * ``test_start_run_does_not_enqueue_when_run_already_active`` — ``created=False`` (existing PENDING/RUNNING) MUST NOT re-enqueue. Without this, every duplicate API call would multiply Redis payloads and waste worker LLM quota. * ``test_start_run_marks_run_failed_and_raises_when_enqueue_raises`` — Redis push failure surfaces as ``RuntimeError`` raised, with the run row marked FAILED carrying the original exception in its reason. Silent success would leave the row in PENDING forever. * ``test_start_run_marks_run_failed_and_raises_when_runtime_not_installed`` — fail-loud guard for test environments / pre-startup boot (``runtime is None``). * ``test_start_run_marks_run_failed_when_runtime_has_no_queue`` — symmetric: runtime present but ``queue=None`` (INLINE-mode test runtime). The ``_FakeQueue`` stub captures pushed payloads and toggles ``raise_on_push`` for the failure path; the heavy collaborators (``_get_and_validate_collection``, ``execute_with_transaction``, ``_mark_run_failed``, ``_run_to_dict``) are stubbed at the instance level so the test exercises only the post-transaction enqueue branch — matches the existing test-style in this file. Local: ``uv run pytest tests/unit_test/graph_curation/test_service.py`` → **8 passed**. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… (Weston PR #1938 BLOCKER) Weston PR #1938 architecture CR (msg=04c9e5ee BLOCKER) caught a real correctness bug in the post-A1 worker catch path: the comment claimed "task-level failure already persisted in PG" but at least three pre-``generate_run`` raise sites bypass the service-layer ``_mark_run_failed`` and leave the run row in ``PENDING``: * ``aperag/graph_curation/integration.py:35-37`` — collection not found. * ``aperag/graph_curation/integration.py:49-61`` — backend / vector / embedder resolution failure. * ``aperag/domains/knowledge_graph/tasks.py:17-26`` — log + re-raise without marking FAILED. Failure chain without the fix: 1. Worker pops the payload (Redis side already consumed). 2. ``generate_graph_curation_run_task`` raises before ``service.generate_run`` runs, so no ``_mark_run_failed`` runs. 3. Run row stays ``PENDING``; queue payload is gone. 4. Next ``start_run`` call sees an "active" PENDING run and returns ``created=False`` without re-enqueueing — the collection's manual full sweep is permanently wedged. Fix --- ``_mark_run_failed_best_effort(engine, run_id, error_message)``: ``UPDATE graph_curation_runs SET status='FAILED', error_message=... WHERE id=:run_id AND status IN ('PENDING', 'RUNNING')``. The WHERE clause keeps this update idempotent w.r.t. ``generate_run`` having already written FAILED inside its own try/except — only stuck-in- transit rows get rewritten, finalised rows are preserved. The fail-safe is itself wrapped in ``try/except`` so a brief PG outage does not propagate up and halt the worker loop — the loop must keep popping subsequent payloads regardless. Error message is truncated to 1024 chars to stay polite to the ``Text`` column; the full traceback is in the worker log via ``logger.exception``. The ``_process_one_run`` catch path now logs + invokes the fail-safe under ``asyncio.to_thread`` (sync DB I/O) before swallowing the exception. ``engine`` is checked for ``None`` so the pure-unit tests that don't pass an engine keep working. Tests ----- * ``test_worker_loop_swallows_task_exception_and_marks_failed`` — upgraded the existing swallow test: after a raise the loop still processes the next payload AND ``_mark_run_failed_best_effort`` is invoked exactly once (only for the raising run) carrying the exception type + message in the reason. * ``test_mark_run_failed_best_effort_only_updates_pending_or_running`` — direct unit on the fail-safe SQL, asserts the ``status IN ('PENDING', 'RUNNING')`` predicate is present and the error message gets truncated to ≤ 1024 chars. * ``test_mark_run_failed_best_effort_swallows_db_errors`` — a ``begin()`` that raises ``OSError`` MUST NOT propagate out of the fail-safe; the worker loop must keep going. Local: ``uv run pytest tests/unit_test/indexing/test_graph_curation_run_orchestrator.py tests/unit_test/test_app_lifespan_no_workers.py tests/unit_test/graph_curation/test_service.py`` → **31 passed**. Spec amend candidate -------------------- Per architect msg=7af40610: spec v1.1 amend should add an explicit "worker catch fail-safe" invariant to § 3.1.1 + § 5.2.b boundary test gate, so the obligation is documented at the spec level rather than discovered at impl-time again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@huangheng

task #31 spec v1.1 amend — fold Phase A 4/4 done 实施 surface 的 spec drift + spec lock invariants + lesson sediment trail. ## v1.1 Amend Scope - § 3.1.1 worker loop fail-safe invariant (PR #1938 Weston BLOCKER → spec lock) - § 3.1.2 action API response shape model_validate contract (PR #1940 Weston BLOCKER → spec lock) - § 3.1.6 DISMISSED enum source 修正 (PR #1935 ziang grep main 实证 v1 spec drift fix) - § 5.2.b 新增 3 boundary test invariants - § 5.2.c 新增 Phase A 实施 sediment trail - § 6 cr-checklist 加 5 sediment items - Migration chain 时序: 5 new value (含 DISMISSED) 不是 4 - fix-forward 1 (commit d50864f) — 全文 6 处 4→5 enum count global sweep + § 1.1 line 17 pre-A2 实证口径补齐 (per Weston BLOCKER msg=2ad46e97) ## CR - @符炫炜 architect (own draft) - @huangheng cr-checklist sediment cite verify ✅ msg=b276da50 - @weston framing verify ✅ msg=a111fcc3 (re-final pass post fix-forward 1) ## CI - lint-and-unit ✅ - e2e-http-smoke 3/3 ✅ (auto-merge after green) - provider-preflight 3/3 ✅ - docs-only lite gate satisfied 🤖 Architect ratify by Claude Code

#1943) * docs(cr-checklist): task #31 Phase A 全闭环后 sediment fold-in 子 PR 2 § 四加 6 lesson sediment（task #31 Phase A 4 PR + task #33 P3 PR #1933 codify 累计实证 + multi-PR same-hour multi-source first-principles catch trust-framing miss）+ § 六 sediment 引用追加 5 PR commit cross-link + § 八修订记录追加本 PR fold-in 完整 trail。新增 lesson: - Lesson #12 v9 third + fourth + fifth-application demos (PR #1935 ziang DISMISSED enum impl-side catch + dongdong response_model legacy field filter BLOCKER 双 same-PR / PR #1938 Weston worker fail-safe BLOCKER upstream raise points trace / PR #1940 Weston SuggestionActionResponse.message required field catch) — sediment 升级 systemic 信号 reviewer chain 必独立 first- principles re-verify - Migration chain 时序 second-application demo (PR #1935 复用 table extend pattern 跟 PR #1910 新建 enum hard-cut migration 时序约束不同; 5 new enum value APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED/DISMISSED + evidence_refs JSON column + ACCEPTED legacy zero-write grep gate) - Lesson #17 second-application demo (PR #1935 backend 收敛 canonical contract 时同 PR fold-in legacy projection layer 保 backward-compat - suggestion_ batch_id=run_id alias 等 - 跟 deprecation marker Lesson #14 family 配) - Lesson #18 formally established: lesson sediment + mechanical gate 双 layer codification 「一记一 enforce」(first-app PR #1933 4-source default value parity / second-app PR #1941 description-free read scope + service.py:845 bonus catch / third-app PR #1941 fix-forward sister tests 防 whole-file exclude 静默削弱 gate) - mini-pattern 19: spec lock pre-check grep main 实证 enum/contract assumption (architect own-up 升级版三层: spec→impl / impl→response_model / impl catch path→upstream raise points) - mini-pattern 20: PR adds response_model wire-up 必跑 model_validate(actual_ handler_return_shape) boundary gate (PR #1940 first-application demo) per architect dispatch msg=b6726ac9 + msg=420ca548 sediment trigger A 满足 (task #31 Phase A 4/4 done) 启动 + Phase B B1 lane huangheng owner. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(cr-checklist): fix cite accuracy NIT per Weston msg=7690b723 2 cite accuracy fixes (Weston framing CR catch): 1. response_model validation failure 状态码: 422 -> 500 - response_model validation fails 抛 FastAPI ResponseValidationError - 通常映射到 HTTP 500，不是 request body 校验的 422 - 影响 line 745 + line 850 描述 PR #1940 BLOCKER 时的状态码引用 2. GraphMergeSuggestionItem canonical schema 字段实证修正 - 原写: ... / observed_types / type_conflict / suggested_entity_type - 实际 main aperag/domains/knowledge_graph/schemas.py::GraphMergeSuggestionItem 不含这三字段 - A4 (PR #1940) 这些字段是 FE-derived display (FE 从 entities / suggested_target_entity / evidence_refs 推导)，不是 PR #1935 backend projection - 影响 line 781 sect 4 Lesson #17 second-application demo 描述 per Weston PR #1943 framing CR (msg=7690b723) - sediment cite accuracy 要求把事实漂移修干净，避免 future onboarding reference 时 confuse 422/500 状态码语义 + backend/FE field source attribution。不阻塞 main fold-in scope - 6 lesson sediment + 5 PR commit cross-link 其他 framing 全 accurate (Weston verified)。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…field Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ty matrix (#1949) * feat(collection): task #61 P1-D3 — vector backend identity + capability matrix Project the deployment-wide ``settings.vector_db_type`` onto every collection detail read so the FE can render a "what does this vector backend actually support" panel without per-collection migration or runtime probe. Backend (output-only projection): - ``aperag/schema/common.py``: ``VectorBackendCapabilities`` + ``VectorBackendInfo`` + ``_STATIC_VECTOR_BACKEND_CAPABILITIES`` dict + ``project_vector_backend_info()`` helper. - ``aperag/domains/knowledge_base/schemas.py:Collection``: add ``vector_backend: Optional[VectorBackendInfo]``. **Intentionally NOT on ``CollectionConfig``** so the OpenAPI ``CollectionCreate`` / ``CollectionUpdate`` input shapes do not let callers mistake a deployment-wide setting for a per-collection editable knob (per dongdong msg=c2593fdd + PM msg=caf7e4df + architect msg=0044261f read-only projection lock). - ``aperag/domains/knowledge_base/service/collection_service.py``: populate ``vector_backend`` in ``build_collection_response`` from ``settings.vector_db_type``; ``None`` for unknown backends so the FE can render a placeholder without a hard failure. Cross-PR consistency with task #83 / PR #1948 (Bryce, vector adapter behavior fixes): - Bryce's connector-layer ``BACKEND_CAPABILITIES`` ClassVar declares 2 truth flags (``supports_atomic_batch_upsert`` + ``supports_legacy_mode``); this PR's schema-layer Pydantic model mirrors those values plus a 3rd schema-layer-only flag ``supports_filter_or_with_empty_parts`` which is uniformly False across adapters after task #83 P1-V3 (translator-level defense-in-depth rejects empty Or parts). - The 3rd flag stays in the schema so the FE can declare the uniform reject explicitly per spec § 2.3 P1-D3 「显示『允许差异但显式』」 — Lesson #17 backend 收敛 contract simple-stable family pattern (cite PR #1930 SearchHit normalize, PR #1935 GraphMergeSuggestionItem projection layer). Mechanical gate (per Lesson #18 lesson-sediment + mechanical-gate 双 layer codification — first established by chenyexuan PR #1933 / PR #1941, then PR #1940 ``model_validate`` boundary): 13-case unit suite in ``tests/unit_test/contracts/test_vector_backend_capability_matrix.py`` pins each capability flag, normalizes inputs, and round-trips Pydantic ``model_dump`` so future drift between schema, projection helper, and FE-consumed shape fails fast at unit-test time. FE (read-only display): - ``web/src/features/collection/types.ts``: typed mirrors ``VectorBackendInfo`` / ``VectorBackendCapabilities`` / ``VectorBackendType``. - ``web/src/app/workspace/collections/[collectionId]/settings/collection-vector-backend-card.tsx``: new component that surfaces backend identity + capability matrix in the collection settings page (above the edit form). dongdong picks up rendering polish (responsive + dark mode + final copy) on the same PR per the joint A4-style split (cuiwenbo contract layer + dongdong rendering polish + CR pair). - ``web/src/i18n/{en-US,zh-CN}/page_collections.json``: copy strings. - ``web/src/api-v2/schema.d.ts`` regenerated via ``yarn api:v2:types``. Local verification: - ``uv run --extra test pytest tests/unit_test/contracts/test_vector_backend_capability_matrix.py tests/unit_test/contracts/test_collection_v2_openapi_contract.py -q`` → 23 passed - ``make openapi-check`` → ok - ``yarn type-check --pretty false`` → 0 new errors on this PR's files (pre-existing graph-lab cosmograph + agent-runtime errors unchanged) - ``yarn lint --quiet`` → 0 warnings/errors - ``yarn i18n:check`` → ok - ``git diff --check`` → ok Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(collection): task #87 P1-D3 — convert vector_backend to computed_field Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f / msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate + PM msg=03c821b0 fix-forward direction lock: the previous regular-field ``Optional[VectorBackendInfo]`` implementation leaked the deployment projection onto every input shape that referenced ``Collection``, including ``Collection-Input`` itself, ``Agent-Input.collections``, and ``CreateTurnRequest.collections``. That contradicted the read-only output projection lock from architect msg=0044261f. Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field`` property so OpenAPI input/output schemas auto-split: - ``Collection-Output`` now lists ``vector_backend`` with ``readonly: true`` (verified in regenerated ``web/src/api-v2/schema.d.ts``). - ``Collection-Input`` no longer carries ``vector_backend`` (verified by grep + new contract test). - ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` / ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``, so the deployment-wide setting can no longer be passed as a per-collection override on agent / chat-turn requests. The ``build_collection_response`` constructor no longer passes ``vector_backend`` (computed fields are not accepted as input); the property reads ``settings.vector_db_type`` lazily on each serialization. Two new contract tests: - ``test_collection_input_schema_does_not_expose_vector_backend``: pin the input/output JSON Schema split + ``readOnly`` flag on the output side. Asserts ``CollectionCreate`` / ``CollectionUpdate`` also do not surface ``vector_backend``. - ``test_collection_constructor_ignores_vector_backend_input``: defensive — even if a malicious caller stuffs ``vector_backend`` into a ``model_validate`` payload, Pydantic ignores it and the computed property still reflects the deployment setting. Sediment: cuiwenbo own-up CR miss — implement-time only verified the ``CollectionConfig`` placement (one defense layer) and missed the ``Collection`` self-reuse-as-input second layer. dongdong + Weston + huangzhangshu independently caught via OpenAPI generated-schema gate. mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify must grep ``references Collection`` to catch input/output reuse risk, not only direct form-input shape" (continuing the trust-framing-miss family from PR #1935 / #1938 / #1940). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): consolidate vector_backend_capability_matrix imports for ruff Combine the two from aperag.schema.common import ... statements into a single block so ruff's import organization rule is satisfied. No code-behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): apply ruff format to vector_backend test + common.py Run `uv run ruff format` on ApeRAG/aperag/schema/common.py and ApeRAG/tests/unit_test/contracts/test_vector_backend_capability_matrix.py so `make lint` (`ruff format --check`) passes. Pure formatting; no behavior change. Other unrelated files reverted to keep this PR scope clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

earayu and others added 3 commits April 30, 2026 17:08

earayu force-pushed the bryce/task-75-a1-graph-curation-run-worker-lane branch from 825b55d to a0c7293 Compare April 30, 2026 09:09

earayu merged commit 4cd2e6f into main Apr 30, 2026
10 checks passed

earayu deleted the bryce/task-75-a1-graph-curation-run-worker-lane branch April 30, 2026 09:20

This was referenced Apr 30, 2026

docs(task-31): spec v1.1 amend — Phase A 实施 sediment fold-in #1942

Merged

docs(cr-checklist): task #31 Phase A close-out sediment fold-in 子 PR 2 #1943

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(indexing): task #31 Phase A1 — independent graph_curation_run worker lane#1938

feat(indexing): task #31 Phase A1 — independent graph_curation_run worker lane#1938
earayu merged 3 commits into
mainfrom
bryce/task-75-a1-graph-curation-run-worker-lane

earayu commented Apr 30, 2026

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 30, 2026

Summary

Changes

Trigger reconcile (per spec § 3.1.1.b)

Tests

Spec / scope alignment

Test plan

Not blocking

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

earayu commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant