feat(graph-extractor): prompt v2 — 7 hard requirements + window-aware parser (task #30-A3)#1920
Merged
Merged
Conversation
58f2480 to
444fb88
Compare
… parser Task #30-A3 / sub-task #53. Implements the prompt v2 contract from ``docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md`` §3.1.3 — every entity / relation now carries a non-empty ``source_chunk_ids`` list, the prompt prepends a ``[[chunk_id=<id> index=<n>]]`` boundary marker per chunk, and the parser validates returned ids against the window allowlist. The 7 hard requirements all land in this PR: 1. Per-chunk boundary markers in the rendered prompt. 2. ``source_chunk_ids`` REQUIRED on every entity + relation. 3. Cross-chunk relations encouraged in the prompt rules. 4. Dedup / canonical-name guidance in the prompt rules. 5. Fail-safe / no-fabrication clause. 6. Cap rules now talk about per-window output (the ``max_entities`` and ``max_relations`` placeholders are filled by huangheng's A2 co- scale change once that PR lands; this PR just stops calling them ``per chunk`` in the rule text). 7. Graph extractor still passes ``response_format={"type": "json_object"}`` to the LLM callable (kept verbatim from the task #14 / PR #1877 setup). Implementation - ``aperag/indexing/llm.py``: rewrite ``ENTITY_RELATION_EXTRACTION`` with the v2 rules + the ``source_chunk_ids`` schema; rename the former single-chunk ``input_text`` parameter to a ``window_chunks`` sequence and emit one ``[[chunk_id=...]]`` block per chunk; render ``allowed_chunk_ids`` so the LLM has the explicit allowlist; expose a ``few_shot_locale`` opt-in (default ``None``) that pulls from a small ``zh`` / ``cross_chunk`` example library for collection configs that want a non-English / cross-chunk hint without paying the prompt-token tax by default. - ``aperag/indexing/graph_extractor.py``: introduce ``_extract_one_window`` taking a sequence of chunks; keep ``_extract_one_chunk`` as a thin compat shim that wraps a single-chunk window so the ziang #1918 dispatcher and existing callers stay green until A1 lands. Rework the parser: ``_parse_extraction_response`` now takes ``allowed_chunk_ids`` instead of a single ``chunk_id``, and the new ``_resolve_source_chunk_ids`` enforces the spec invariant — single-chunk window falls back to the lone allowed id when the field is missing, multi-chunk window requires the LLM to populate it (otherwise the record is skipped + warned), and any out-of- allowlist id is silently dropped before the non-empty check so hallucinated chunk_ids do not poison provenance. Tests - ``tests/integration/test_graph_extractor.py``: rewire every ``_parse_extraction_response`` call to the new ``allowed_chunk_ids`` keyword, update the rendered-prompt smoke test to the new ``window_chunks`` signature, add new tests for per-chunk markers, the few-shot opt-in, the empty-window guard, the single-chunk fallback, the multi-chunk required-provenance branch, the out-of-allowlist drop, and the cross-chunk provenance preservation. The ``build_collection_llm_callable`` monkeypatch stubs now accept ``**_kwargs`` so the patched lambdas tolerate the ``response_format`` keyword that ``build_collection_graph_extractor`` has been passing since task #14 (these stubs were silently accumulating an arity mismatch). Out of scope (matches the spec dispatch) - Window assembler + ``graph_extraction_window_size`` config — task #30-A1 / @ziang PR #1918. - ``MAX_ENTITIES`` / ``MAX_RELATIONS`` / ``TIMEOUT`` / ``BOOTSTRAP`` / ``MAX_PROMPT_TOKENS`` co-scale — task #30-A2 / @huangheng task #54. - Frontend / generated schema changes for the new collection config — task #30-A1 covers them. Test plan - ``uvx ruff@0.15.12 check aperag/ tests/`` — pass - ``uvx ruff@0.15.12 format --check aperag/ tests/`` — pass - ``uv run --extra test python -m pytest tests/integration/test_graph_extractor.py -q`` — 36 passed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
444fb88 to
70a9b25
Compare
4 tasks
earayu
added a commit
that referenced
this pull request
Apr 29, 2026
#1921) Pre-existing CI flake放行 per ci-flake-policy.md § 2.2: - Run id: 25134494472 / Job id: 73669566493 - Shape: E2E HTTP Qdrant + Nebula (single shape; Lite ✅ + Neo4j ✅) - Signature exact match: aperag/domains/agent_runtime/runtime.py:1056 ValidationException: Model specification is required for agent runtime v3 - PR diff zero intersection with §2.2 forbidden list: changes only graph_extractor.py + KnowledgeGraphConfig schema + boundary tests + regenerated schema.d.ts; agent_runtime / model_platform / mcp / indexing-worker DI / e2e-http workflow / deploy/aperag all untouched All other gates green: lint-and-unit ✅, smoke 3/3 ✅, preflight 3/3 ✅. 5-lane CR收齐 (Bryce A3 fold-in / Weston arch re-CR / ziang impl re-CR / 符炫炜 ratify / huangheng author). task #30 Phase A 全闭环 (A1 #1918 / A3 #1920 / A2 this PR).
6 tasks
earayu
added a commit
that referenced
this pull request
Apr 29, 2026
…ode default ON Two BLOCKER fixes per @ziang msg=56912dae + @huangzhangshu msg=cda4dc75: * **BLOCKER 1** — `source_chunk_ids` validity is now strictly window-scoped: `run_window` computes `source_chunk_ids_valid` / `source_chunk_ids_total` against that single window's `allowed_chunk_ids`, and `aggregate_sample` only sums per-window counters. Previously the per-document union check let a record produced in window-0 reference a chunk_id from window-1 and pass — violating the A3 parser invariant (source_chunk_ids ⊆ current window's chunk_ids, not document union). New unit test `test_aggregate_sample_source_chunk_ids_is_window_scoped_not_union` pins the cross-window pollution case. * **BLOCKER 2** — `--response-format-json` is now ON by default with `--no-response-format-json` as the explicit opt-out. A3 PR #1920 (`01b45196`) made `response_format=json_object` a graph extractor production invariant, but the legacy benchmark default was off, so B2 baseline would have measured `json_ok_rate` / parse failure / cost on the pre-A3 path. README updated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
earayu
added a commit
that referenced
this pull request
Apr 29, 2026
…rics (#1923) * feat(benchmark): task #30 B1 — chunk-window matrix + per-document metrics Extend tests/benchmarks/graph_extraction harness for the task #30 graph chunk window benchmark (spec § 6.3, dispatched in msg=cecae5ed): * Fix `render_extraction_prompt` API drift — the harness was still on the legacy `input_text=...` signature; PR #1918/#1920/#1921 (task #30 Phase A) moved it to `window_chunks=[{chunk_id, text}]`. Without this fix B2 (Planetegg msg=9489efdb) cannot run. * Add `--chunk-window-size N` for single-shape runs and `--matrix N1,N2,...` for batch sweeps (the two are mutually exclusive). Sample text is split into `--pseudo-chunks-per-doc` (default 4) pseudo-chunks and grouped into non-overlapping windows of size N, mirroring the production `_GraphChunkWindow` shape (PR #1918). * Aggregate **per-document** the 7 metrics required by spec § 6.3 + Planetegg msg=ea7efa7b: `llm_call_count`, `input_tokens_total`, `output_tokens_total`, `wall_time_s`, `timeout_or_failure_count`, entity+relation totals + duplicate counts, and the new `source_chunk_ids_valid` / `source_chunk_ids_total` provenance check (task #30 §3.1.3 hard requirement #2). * `--dry-run` produces a placeholder schema for B2 to verify ingestion before paying provider cost (per Planetegg msg=cbe84223). * `test_runner_units.py` pins the harness structural pieces (chunking, windowing, validity counting, per-document aggregation) so the matrix output schema stays contract-stable even though the benchmark itself runs out-of-CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(benchmark): task #30 B1 — window-scoped source_chunk_ids + JSON-mode default ON Two BLOCKER fixes per @ziang msg=56912dae + @huangzhangshu msg=cda4dc75: * **BLOCKER 1** — `source_chunk_ids` validity is now strictly window-scoped: `run_window` computes `source_chunk_ids_valid` / `source_chunk_ids_total` against that single window's `allowed_chunk_ids`, and `aggregate_sample` only sums per-window counters. Previously the per-document union check let a record produced in window-0 reference a chunk_id from window-1 and pass — violating the A3 parser invariant (source_chunk_ids ⊆ current window's chunk_ids, not document union). New unit test `test_aggregate_sample_source_chunk_ids_is_window_scoped_not_union` pins the cross-window pollution case. * **BLOCKER 2** — `--response-format-json` is now ON by default with `--no-response-format-json` as the explicit opt-out. A3 PR #1920 (`01b45196`) made `response_format=json_object` a graph extractor production invariant, but the legacy benchmark default was off, so B2 baseline would have measured `json_ok_rate` / parse failure / cost on the pre-A3 path. README updated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Task #30-A3 / sub-task #53. Implements the prompt v2 contract from
docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md§3.1.3 — every entity / relation now carries a non-emptysource_chunk_idslist, the prompt prepends a[[chunk_id=<id> index=<n>]]boundary marker per chunk, and the parser validates returned ids against the window allowlist.7 hard requirements landed
source_chunk_idsREQUIRED on every entity + relation in the JSON schema.max_entitiesandmax_relationsnumbers are still rendered as-is from caller args; @huangheng's task feat: page framework #30-A2 / task fix: husky hooks #54 fills the co-scale formula).response_format={"type": "json_object"}to the LLM callable (kept verbatim from task feat: socket debug #14 / PR feat(llm): wire response_format=json_object through CompletionService for graph extractor (issue #1861, task #14) #1877).Implementation
aperag/indexing/llm.py—ENTITY_RELATION_EXTRACTIONrewritten with the v2 rules;render_extraction_promptnow takeswindow_chunksinstead of a singleinput_textand renders one[[chunk_id=...]]block per chunk. Newfew_shot_localeopt-in (defaultNone) pulls from azh/cross_chunkexample library for collection configs that want a non-English or cross-chunk hint without paying the prompt-token tax by default.aperag/indexing/graph_extractor.py— new_extract_one_windowtaking a sequence of chunks; the legacy_extract_one_chunkis kept as a thin compat shim that wraps a single-chunk window so @ziang's task feat: page framework #30-A1 (PR feat(indexing): add graph extraction window assembler #1918) dispatcher + existing callers stay green until A1 lands. Parser rewired:_parse_extraction_responsenow takesallowed_chunk_ids, and the new_resolve_source_chunk_idsenforces the spec invariant — single-chunk window falls back to the lone allowed id when the field is missing, multi-chunk window requires the LLM to populate it (otherwise the record is skipped + warned), and any out-of-allowlist id is silently dropped before the non-empty check so hallucinated chunk_ids do not poison provenance.Tests
_parse_extraction_responsecall to the newallowed_chunk_idskeyword.window_chunkssignature.build_collection_llm_callablemonkeypatch stubs now accept**_kwargsso the patched lambdas tolerate theresponse_formatkeyword thatbuild_collection_graph_extractorhas been passing since task feat: socket debug #14 (these stubs were silently accumulating an arity mismatch).Out of scope
graph_extraction_window_sizeconfig — task feat: page framework #30-A1 / @ziang PR feat(indexing): add graph extraction window assembler #1918.MAX_ENTITIES/MAX_RELATIONS/TIMEOUT/BOOTSTRAP/MAX_PROMPT_TOKENSco-scale — task feat: page framework #30-A2 / @huangheng task fix: husky hooks #54.Merge order
PM @不穷 dispatched A1 → A3 → A2 (msg=fadcd64c). This PR (A3) will be rebased onto A1 after PR #1918 lands; A2 then folds the co-scale formula into the prompt's
max_entities/max_relationsplaceholders.Test plan
uvx ruff@0.15.12 check aperag/ tests/— passuvx ruff@0.15.12 format --check aperag/ tests/— passuv run --extra test python -m pytest tests/integration/test_graph_extractor.py -q— 36 passed🤖 Generated with Claude Code