Wave 11: dynamic graph entity types#1789
Conversation
|
Findings:
Checked:
After these are fixed, the main dataflow shape looks aligned: collection language reaches prompt rendering, entity types stay as strings, |
1a09909 to
f8210c6
Compare
|
LGTM. I rechecked the three previous blockers against head
Additional checks:
The implementation now matches the v3 design lock: hard-cut |
|
Architect ratify ✅ — 三段 hard-gate 全过 (12-invariant + 4-pattern/11 mini-pattern + simple-stable 4-guardrail). Spot-check confirms async signature, SELECT FOR UPDATE single path, per-document granularity, prompt no cap. CI 10/10 + 84 tests pass + grep zero. Proceeding to merge. |
Per @earayu2 ask in #Graph可视化 (msg=4a8309e4): one menu page where
the team can flip between visualisation experiments without leaving
the production graph routes alone. The first two embedded modes are
both Cosmograph (`@cosmograph/react` 2.x): a topology renderer that
consumes the live `/api/v2/collections/{id}/graphs` payload, and a
semantic-map renderer that visualises a deterministic synthetic
fixture at 1k / 5k / 10k point scales for FPS benchmarking. The nav
also links out to the existing `/graph` and `/graph-showcase`
production pages so all four can be compared in adjacent tabs.
Why a PoC sibling instead of overwriting `graph-showcase`:
`graph-showcase/` already carries 745 LOC of an earlier
`react-force-graph-2d` exploration (see
`collection-graph-showcase.tsx`); replacing it would lose the prior
visual probe. PM ratified the new sibling path in DM (msg=30e30aab).
License caveat surfaced inline in the nav: `@cosmograph/react` is
CC-BY-NC-4.0 — explicitly fine for a PoC per @earayu2 (msg=26e5530d)
but flagged as a productionisation gate. The MIT-licensed
`@cosmos.gl/graph` engine is the intended swap target if the
visual direction sticks.
Files:
- `web/src/app/workspace/collections/[collectionId]/graph-lab/page.tsx`
Server component, breadcrumbs + container layout (mirror of
`graph-showcase/page.tsx`).
- `…/graph-lab/collection-graph-lab.tsx` Client shell, wires the nav
to one of two embedded renderers.
- `…/graph-lab/graph-lab-nav.tsx` Left rail with mode tabs +
external page links + license note.
- `…/graph-lab/cosmograph-topology.tsx` Live KG → Cosmograph Graph
mode. `dynamic({ ssr: false })` keeps the WebGL renderer out of the
server pass. Detail panel reuses cluster legend + selected-node
metadata.
- `…/graph-lab/cosmograph-semantic-map.tsx` Synthetic fixture →
Cosmograph embedding mode (`pointXBy`/`pointYBy` set, simulation
off). Scale toggle bench panel exposes generate-time + first-frame
latency for the 1k/5k/10k presets.
- `…/graph-lab/lab-modes.ts` Tiny shared module so the server-side
page.tsx stays import-light.
- `web/src/features/knowledge-graph/cosmograph-adapter.ts` Pure data
transform, `KnowledgeGraph → {points, links, clusterLabels}`. Two
entry points (`toTopologyDataset` for live data,
`buildSyntheticSemanticDataset` for deterministic fixture) so the
PoC can rebuild reproducibly under different scales.
§K.12 invariant cross-check
============================
Largely n/a (frontend exploratory PoC). Direct hits:
- #11 (read-only / not auto-action) — the lab page is read-only;
no merge/curate surface added.
- #12 (grep-zero LightRAG) — `rg -i lightrag web/src` returns zero
hits both before and after.
4-pattern pre-check matrix
==========================
- Pattern 1 v1 (existing graph routes): the production
`/graph` page (`collection-graph.tsx`) and the prior
`/graph-showcase` page are not modified; no imports moved or
renamed. `rg "from '\\..*collection-graph(-showcase)?'"` returns
only the original page bindings.
- Pattern 1 v2 (data-shape consumers): the new adapter consumes
`KnowledgeGraph`/`GraphNode`/`GraphEdge` from
`@/features/knowledge-graph/types` — same source the existing
graph page uses. No type duplication.
- Pattern 2 (response-shape change list): n/a, no API or schema
touched.
- Pattern 3 (additive helpers): the adapter and synthetic fixture
generator live in `features/knowledge-graph/`; both are pure
functions so the production graph page can adopt them later
without circular imports.
simple-stable directive 4-guardrail
===================================
- #1 不无限扩范围 — explicit PoC scope per @ziang acceptance
(msg=4db79b66). No backend changes; no `/api/v2/...` additions;
semantic mode uses front-end fixture data only.
- #2 尽快上线 — single-PR ship, ~0.5 day work envelope.
- #3 简单稳定 — production routes (`/graph`, `/graph-showcase`)
are byte-identical before/after this commit; PoC lives in its
own directory.
- #4 私有化部署免维护 — no operator config introduced. The
Cosmograph dependency is bundled with the rest of the web app;
productionisation will need a license decision (called out in
the nav UI itself so reviewers see it).
Test plan
=========
- `next dev --turbopack` reaches "Ready in 1476ms" cleanly post-add
(verified locally; SSR pass does not load `@cosmograph/react`).
- `yarn tsc --noEmit` reports zero errors in the new files
(`graph-lab/**`, `cosmograph-adapter.ts`). Five pre-existing
errors remain in `graph/collection-graph*.tsx` from PR #1789
and are out of scope.
- Manual smoke deferred to reviewer-side; the bench panel logs
generate-time + first-frame latency for the three scale presets
on render so a screenshot at each scale is enough to satisfy the
PoC acceptance items.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
KnowledgeGraphConfig.entity_types: list[str]with default[]._DEFAULT_ENTITY_TYPESfallback and stop treating entity types as a closed enum in the extraction prompt.merge_entity_types(session, collection_id, new_types)withSELECT ... FOR UPDATEand first-write-wins dedupe.kg.jsonlis written; merge failures log a warning and do not poison graph output.entity_type: str; notype_id/labelsplit and no registry table.docs/modularization/.CR Fixes
Spec Gates
entity_types=[]; LLM can propose the first types from real documents.entity_type=""); the entity is preserved.collection.config.language; entity names preserve source text.entity_typeslist; no prompt list cap, size cap, LRU, or fallback default.kg.jsonl; no per-chunk or per-run merge.12-Invariant Gate
EntityRecord.entity_typeremains a string; notype_idfield is introduced.kg.jsonlentity/relation record shape remains compatible.CurationEntity.entity_typestring paths.kg.jsonleven if config merge fails.Pattern Gate
entity_type: strstorage and API surface.collection.config.language; noautolanguage mode.KnowledgeGraphConfig.entity_types; no new registry table.11 Mini-Pattern Gate
merge_entity_types()is async and takesAsyncSession.SELECT ... FOR UPDATE.kg.jsonlflush._DEFAULT_ENTITY_TYPESis removed and no longer a fallback.Simple-Stable Guardrails
Validation
uv run ruff check aperag/indexing/entity_types.py aperag/indexing/graph.py aperag/indexing/graph_extractor.py aperag/indexing/llm.py aperag/indexing/worker_factory.py aperag/schema/common.py tests/unit_test/indexing/test_entity_types.py tests/integration/test_graph_extractor.py tests/unit_test/indexing/test_t1_2_graph.pyuv run --extra test pytest tests/unit_test/indexing/test_entity_types.py tests/integration/test_graph_extractor.py tests/unit_test/indexing/test_t1_2_graph.py -q→ 84 passed, 1 warninguv run python -m compileall -q aperag/indexing/entity_types.py aperag/indexing/graph.py aperag/indexing/graph_extractor.py aperag/indexing/llm.py aperag/indexing/worker_factory.py aperag/schema/common.py tests/unit_test/indexing/test_entity_types.py tests/integration/test_graph_extractor.py tests/unit_test/indexing/test_t1_2_graph.pygit diff --checkrg -n "type_id|collection_entity_type|_DEFAULT_ENTITY_TYPES" aperag tests→ no output