feat(importer): manifest library + dkg-importer SKILL.md#642
Conversation
| contextGraphId: cgId, | ||
| assertionName: assertion, | ||
| subGraphName, | ||
| entities: [importUri(importId)], |
There was a problem hiding this comment.
🔴 Bug: only the import root is promoted here. The partition subjects (urn:dkg:import:...#part:...) stay in WM, so after the promote loadImportManifest() cannot reliably read imp:key / imp:initialStatus from SWM and another node cannot resume the import. Promote the partition URIs alongside the import URI (chunked if needed).
There was a problem hiding this comment.
Addressed on this branch: createImportManifest now promotes the import root and every partition URI, chunked by ROOT_CHUNK.
| { subject: evIri, predicate: IMPORT_P.status, object: lit(status) }, | ||
| { subject: evIri, predicate: IMPORT_P.recordedAt, object: dt(nowIso) }, | ||
| ]; | ||
| await client.writeAssertion({ |
There was a problem hiding this comment.
🔴 Bug: this appends the status event to the manifest assertion but never promotes it. That leaves progress updates in WM only, so peers will still see the old status and a resume from another node can restart already-finished partitions. Promote the partition/event roots after each status write, or use the direct SWM path if that's the intended visibility.
There was a problem hiding this comment.
Addressed on this branch: markPartitionStatus writes the status event and promotes it so progress becomes visible from SWM.
| imp:initialStatus ?initial . | ||
| OPTIONAL { | ||
| { | ||
| SELECT ?part (SAMPLE(?status) AS ?latestStatus) (MAX(?recordedAt) AS ?latestRecordedAt) WHERE { |
There was a problem hiding this comment.
🔴 Bug: SAMPLE(?status) is not correlated with MAX(?recordedAt). Once a partition has multiple events, this subquery can return the newest timestamp with an older status, so loadImportManifest() may report pending/failed for a partition that is actually done. Select the row with the max timestamp (for example via ORDER BY DESC(?recordedAt) LIMIT 1 per partition) instead of mixing independent aggregates.
There was a problem hiding this comment.
Addressed on this branch: loadImportManifest uses a latest-row selection pattern instead of SAMPLE plus MAX, so status and timestamp stay correlated.
| const bindings = res?.result?.bindings ?? res?.bindings ?? []; | ||
|
|
||
| const partitions = bindings.map((row) => { | ||
| const key = unquote(row.key); |
There was a problem hiding this comment.
🔴 Bug: /api/query returns SPARQL-JSON binding cells ({ value, type, ... }) on the normal path, but this code treats row.key, row.part, row.latestStatus, etc. as raw strings. The returned manifest state will contain objects instead of strings and the sort/comparisons will misbehave. Normalize bindings through .value/.id first, the same way the other daemon clients do.
There was a problem hiding this comment.
Addressed on this branch: query bindings are normalized from SPARQL JSON cells through .value before sorting/comparing.
5d2a4e9 to
673dfca
Compare
| contextGraphId: cgId, | ||
| assertionName: assertion, | ||
| subGraphName, | ||
| entities: [evIri], |
There was a problem hiding this comment.
🔴 Bug: promoting only evIri does not publish the partIri -> imp:statusEvent -> evIri triple you just wrote on line 266. assertion.promote({ entities: [...] }) only moves quads whose subject is one of those roots, so after a restart or a peer-side resume loadImportManifest() still sees no status events and treats every partition as pending. Promote both partIri and evIri here, or flip the edge so the event points at the partition.
There was a problem hiding this comment.
Addressed on this branch: markPartitionStatus promotes both partIri and evIri so the partition-to-event edge is visible to loadImportManifest.
| /** | ||
| * Unit tests for the pure helpers in `scripts/lib/manifest.mjs`. | ||
| * | ||
| * Daemon-roundtrip behaviour (createImportManifest / markPartitionStatus / |
There was a problem hiding this comment.
🟡 Issue: this test file only exercises pure helpers, but the risky behavior in this PR is the WM→SWM round-trip through markPartitionStatus()/loadImportManifest(). A regression in promote root selection will pass CI unnoticed. Please add an integration-style test that creates a manifest, marks a partition done, promotes it, and then verifies loadImportManifest() reads done back from shared memory.
There was a problem hiding this comment.
Addressed on this branch: added mock WM/SWM round-trip tests covering create, mark done, promote, and reload from shared memory.
…ration tests Addresses Codex round-2 review on PR #642 (12:15Z): 1. markPartitionStatus now promotes BOTH `partIri` and `evIri`. Codex correctly flagged that the `partIri imp:statusEvent evIri` triple has subject `partIri` — `assertion.promote({entities})` only moves quads whose subject is in `entities`, so promoting only `evIri` left the partition→event edge in WM and a peer-side `loadImportManifest` couldn't see the new status. Re-promoting `partIri` is idempotent for its prior triples and ships the new edge. 2. `unquote()` now collapses SPARQL 1.1 results-JSON cells (`{value, type, datatype?, "xml:lang"?}`) to their `.value` before the literal-unquoting pass. The daemon's `bindingValue` helper already handles both shapes; this library was assuming only the flat-string shape Oxigraph emits. Added `bareUri()` for URI bindings and switched `loadImportManifest` to use it for `?part`. 3. New integration-style tests in scripts/lib/__tests__/manifest.test.mjs cover: - full WM→SWM round-trip with flat bindings (Oxigraph shape) - same round-trip with SPARQL-JSON cell bindings (sparql-http shape) - latest-event-wins across multiple status updates on one partition A mock client tracks the WM/SWM split as two in-memory sets and serves SPARQL queries from SWM, exactly matching the daemon's promote contract. Confirmed all three tests FAIL when markPartitionStatus reverts to promoting only `evIri` — that's the regression coverage Codex asked for. 4. SKILL.md updated to document the dual-promotion behaviour so downstream importers / agents implementing this contract don't re-introduce the bug. Co-authored-by: Cursor <cursoragent@cursor.com>
673dfca to
c5e6d51
Compare
| } | ||
| const cgId = client.cgId; | ||
| const assertion = assertionName ?? `import-manifest-${importId}`; | ||
| await client.request('POST', '/api/assertion/create', { |
There was a problem hiding this comment.
🔴 Bug: this helper is meant for resumable imports, but /api/assertion/create returns already exists on retries. A transient failure after the create succeeds, or a second process resuming the same import, will now fail before it can load progress. Please treat an existing manifest assertion as a successful reuse (or add an explicit ensure flow) so createImportManifest() is idempotent.
There was a problem hiding this comment.
Addressed in 3b04c99: createImportManifest now treats an existing manifest assertion as idempotent reuse and still runs the additive write/promote healing path.
| FILTER NOT EXISTS { | ||
| ?part imp:statusEvent ?ev2 . | ||
| ?ev2 imp:recordedAt ?ts2 . | ||
| FILTER (?ts2 > ?latestRecordedAt) |
There was a problem hiding this comment.
🔴 Bug: this "latest event" filter only excludes strictly later timestamps. recordedAt is only millisecond precision, so two status writes in the same ms both survive and you can return duplicate rows for one partition with nondeterministic status selection. Add a deterministic tie-breaker for equal timestamps (for example compare ?ev as a secondary key) or use a monotonic sequence instead of raw ISO timestamps alone.
There was a problem hiding this comment.
Addressed in 3b04c99: latest-event selection now uses the event IRI as a deterministic tie-breaker for equal millisecond timestamps, with a same-ms regression test.
| } | ||
| `; | ||
| const res = await client.query({ sparql, contextGraphId: cgId, subGraphName }); | ||
| const bindings = res?.result?.bindings ?? res?.bindings ?? []; |
There was a problem hiding this comment.
🔴 Bug: createImportManifest() rejects empty partition lists, so an empty binding set here means the manifest was missing or not visible in SWM, not that the import is complete. Returning { partitions: [] } will make resume callers skip all remaining work on a typo / wrong sub-graph / failed promote. Please fail loudly when no manifest rows are found, or explicitly verify that the import root exists before treating this as success.
There was a problem hiding this comment.
Addressed in 3b04c99: loadImportManifest now throws when no manifest rows are visible instead of returning an empty partition list, with a regression test.
| } | ||
| } | ||
| `; | ||
| const res = await client.query({ sparql, contextGraphId: cgId, subGraphName }); |
There was a problem hiding this comment.
🔴 Bug: client.query() defaults to working-memory when no view/graph suffix is supplied, so this read will miss the manifest/status triples you just promoted to SWM. On a restart or peer resume, loadImportManifest() will report the manifest as missing even though it exists in shared memory. Plumb view: 'shared-working-memory' (or graphSuffix: '_shared_memory') through DkgClient.query and use it here, then update the mock/test to match the daemon's real default.
| /** Stable-ish URI for one StatusEvent. Exported so callers can promote it. */ | ||
| export function statusEventUri(importId, key) { | ||
| // Random-ish suffix keeps multiple events on the same partition unique. | ||
| const suffix = `${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; |
There was a problem hiding this comment.
🔴 Bug: the same-millisecond tie-break in loadImportManifest() orders by event IRI, but this suffix is random, not monotonic. If two status updates for the same partition land in one millisecond, the older one can win nondeterministically and regress the visible status. Use an order-preserving suffix (for example timestamp + per-process counter) or persist an explicit sequence field and sort by that instead of a random token.
|
|
||
| ```python | ||
| import requests | ||
| token = open(os.path.expanduser('~/.dkg/auth.token')).read().strip() |
There was a problem hiding this comment.
🟡 Issue: This copy-paste Python example is not runnable as written: it uses os.path.expanduser(...) without importing os, and it relies on port below without defining it. Add the missing setup (import os, a default port = 9200 or a function parameter) so the sample works when readers try it.
…mport Adds a short "Empirical motivation" section between the Consequences and Rejected alternatives lists, with concrete data from importing this repo's Graphify output (26,960 nodes / 63,003 edges) on rc.10: - Hand-rolled importer from dkg-node SKILL.md alone: ~330 LOC, half of which is cap-discovery ceremony. - Same workload using rc.10's existing scripts/lib/dkg-daemon.mjs DkgClient: ~half the LOC, gaps remaining are the ones #642 and #643 close. - Four of 17 partitions exceeded the 10 MB gossip cap; total wall- clock in halve-and-retry: ~15 minutes on top of legitimate work. - Write latency degrades ~200x from baseline once SWM grows past ~100k triples; that cost is what motivates #643 but requires the chunking discipline this ADR formalises. Keeps the headlines so the contract's "why these numbers" question has a concrete answer for reviewers; the full per-partition data lives in the rc.10 test artifacts referenced in the PR description. Co-authored-by: Cursor <cursoragent@cursor.com>
…ration tests Addresses Codex round-2 review on PR #642 (12:15Z): 1. markPartitionStatus now promotes BOTH `partIri` and `evIri`. Codex correctly flagged that the `partIri imp:statusEvent evIri` triple has subject `partIri` — `assertion.promote({entities})` only moves quads whose subject is in `entities`, so promoting only `evIri` left the partition→event edge in WM and a peer-side `loadImportManifest` couldn't see the new status. Re-promoting `partIri` is idempotent for its prior triples and ships the new edge. 2. `unquote()` now collapses SPARQL 1.1 results-JSON cells (`{value, type, datatype?, "xml:lang"?}`) to their `.value` before the literal-unquoting pass. The daemon's `bindingValue` helper already handles both shapes; this library was assuming only the flat-string shape Oxigraph emits. Added `bareUri()` for URI bindings and switched `loadImportManifest` to use it for `?part`. 3. New integration-style tests in scripts/lib/__tests__/manifest.test.mjs cover: - full WM→SWM round-trip with flat bindings (Oxigraph shape) - same round-trip with SPARQL-JSON cell bindings (sparql-http shape) - latest-event-wins across multiple status updates on one partition A mock client tracks the WM/SWM split as two in-memory sets and serves SPARQL queries from SWM, exactly matching the daemon's promote contract. Confirmed all three tests FAIL when markPartitionStatus reverts to promoting only `evIri` — that's the regression coverage Codex asked for. 4. SKILL.md updated to document the dual-promotion behaviour so downstream importers / agents implementing this contract don't re-introduce the bug. Co-authored-by: Cursor <cursoragent@cursor.com>
3b04c99 to
bb8b16e
Compare
…ivation Brings in: - #641: round-3 codex (portable scanner path; importer skill/manifest framed as future work) + empirical motivation subsection in ADR 0002 (~330 LOC hand-rolled importer, ~15 min wall-clock in halve-and-retry). - #642: round-3 codex (createImportManifest idempotency on 'already exists'; timestamp tie-breaker on equal-ms; loud failure on empty manifest bindings) + new §1.1 in dkg-importer/SKILL.md documenting the three hard caps and their verbatim error strings (10 MB /write body, 256 KB /promote body, 10 MB gossip). Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # packages/cli/skills/dkg-importer/SKILL.md # scripts/lib/__tests__/manifest.test.mjs # scripts/lib/manifest.mjs
| } | ||
| } | ||
| `; | ||
| const res = await client.query({ sparql, contextGraphId: cgId, subGraphName }); |
There was a problem hiding this comment.
🔴 Bug: client.query({ sparql, contextGraphId, subGraphName }) uses the legacy data-graph route, which targets did:dkg:context-graph:<cg>/<subGraph> rather than the sub-graph's shared-memory graph. Since this manifest is only written via assertion WM→SWM promotion, a real daemon query here will come back empty even after a successful promote. Route this through the SWM graph explicitly (graphSuffix: '_shared_memory' or equivalent) and add a regression test that matches the real DkgClient.query semantics.
| /** Stable-ish URI for one StatusEvent. Exported so callers can promote it. */ | ||
| export function statusEventUri(importId, key) { | ||
| // Random-ish suffix keeps multiple events on the same partition unique. | ||
| const suffix = `${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; |
There was a problem hiding this comment.
🔴 Bug: the same-millisecond tie-break in loadImportManifest() relies on later events sorting lexicographically higher by IRI, but this suffix is random. Two markPartitionStatus() calls in the same millisecond can therefore resolve to either status arbitrarily. Use a monotonic suffix/counter (or persist an explicit sequence field) so equal-timestamp events preserve write order.
| throw new Error('createImportManifest requires `subGraphName` (typically "meta").'); | ||
| } | ||
| const cgId = client.cgId; | ||
| const assertion = assertionName ?? `import-manifest-${importId}`; |
There was a problem hiding this comment.
🟡 Issue: the default assertion name embeds raw importId, but assertion names reject /, whitespace, and other unsafe characters. importUri()/partitionUri() already accept arbitrary IDs via percent-encoding, so a caller can pass an importId that produces valid manifest URIs but makes /api/assertion/create fail here. Either sanitize the default assertion name or validate importId up front and throw a clear error.
Adds a "What changed in round 3" section to INTEGRATION_NOTES_GRAPHIFY.md summarising the per-PR updates folded into integration after running the stack against a real Graphify codebase import on rc.10: - #641: empirical motivation in ADR 0002 + portable scanner ref + dead-link cleanup - #642: §1.1 caps reference with verbatim error strings + manifest hardening (idempotent create, ms tie-breaker, loud empty bindings) - #643: SMALL_BODY_BYTES correction + full assertion identity for uniqueness + lease-expiry shutdown + §10 empirical appendix Notes that rc.10 import test artifacts are kept local, not committed. Co-authored-by: Cursor <cursoragent@cursor.com>
Implements the resumable-import pattern from ADR 0002 and gives every
agent / human importer a single canonical reference for chunked bulk
writes against a DKG node.
scripts/lib/manifest.mjs:
- buildInitialManifestTriples / createImportManifest write the import
graph as an RDF assertion in the project's `meta` sub-graph.
- markPartitionStatus appends status events (append-only — no
DELETE/INSERT needed; latest event wins on read).
- loadImportManifest + pendingPartitions resolve the current per-
partition status in one SPARQL round-trip, so an interrupted import
can resume from the next pending partition without re-doing work.
- Itself respects the ADR 0002 chunking contract by reusing
DkgClient.writeAssertion.
scripts/lib/__tests__/manifest.test.mjs:
- Six unit tests covering URI encoding, ontology constants, and the
buildInitialManifestTriples / pendingPartitions pure helpers. The
daemon-roundtrip behaviour stays covered by the repro suite in
scripts/repro/.
packages/cli/skills/dkg-importer/SKILL.md (NEW):
- The agent-readable manual for bulk imports. Codifies the chunking
contract from ADR 0002 (CHUNK=5000, ROOT_CHUNK=1000), gives
worked examples in TypeScript + Python, walks through the manifest
pattern, and links to ADR 0003 for canonical URIs.
- Includes anti-patterns ("don't bump MAX_BODY_BYTES to 1GB",
"don't invent a graphify:* URI scheme"), HTTP error handling,
and a one-page cheat sheet.
packages/cli/skills/dkg-node/SKILL.md:
- One-line cross-reference under the assertion-write tool table
pointing agents to dkg-importer/SKILL.md when they're about to
push >5k quads in one go.
Stacked on docs/importer-adrs (#641). Related to #596, #636, #640.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three real correctness bugs Codex flagged on the manifest library, plus
one documentation note clarifying a binding-format concern that turned
out to be a false positive (the daemon's /api/query route returns flat
JSON-encoded literal strings, not SPARQL-1.1 results-JSON cells — same
shape the existing seed/drain scripts already consume via bareIri()).
- createImportManifest now promotes the import root AND every partition
URI (chunked at ROOT_CHUNK=1000 per ADR 0002). Promoting only the root
left partition subjects in WM, so any peer reading the manifest back
from SWM could not see imp:key / imp:initialStatus, breaking resume
across nodes.
- markPartitionStatus now promotes the new StatusEvent root after
writing it. Previously the event sat in WM only, so SWM-visible reads
of the manifest still reported the prior status and a resume on
another node could restart an already-finished partition.
- loadImportManifest's "latest event per partition" sub-query replaced
SAMPLE+MAX (decorrelated — the status could come from one row and
the timestamp from another) with the standard "max row" idiom
(`FILTER NOT EXISTS { ... ?ts2 > ?latestRecordedAt }`). The status
now genuinely corresponds to the chosen timestamp.
- statusEventUri was renamed from a private eventUri to a public export
so the promote step has a stable handle to the just-written event.
Added a unit test covering its shape + uniqueness; existing 6 tests
still green.
- Added a doc note above `unquote()` explaining the daemon's flat
binding format, with cross-refs to the other scripts that depend on
it. This is to prevent a future reader (or a future Codex pass) from
trying to swap in SPARQL-1.1-results-JSON parsing and silently
breaking everything.
- Updated dkg-importer SKILL.md to match the new promote-on-status
behaviour and to call out the SAMPLE/MAX foot-gun for future
importers writing similar queries.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ration tests Addresses Codex round-2 review on PR #642 (12:15Z): 1. markPartitionStatus now promotes BOTH `partIri` and `evIri`. Codex correctly flagged that the `partIri imp:statusEvent evIri` triple has subject `partIri` — `assertion.promote({entities})` only moves quads whose subject is in `entities`, so promoting only `evIri` left the partition→event edge in WM and a peer-side `loadImportManifest` couldn't see the new status. Re-promoting `partIri` is idempotent for its prior triples and ships the new edge. 2. `unquote()` now collapses SPARQL 1.1 results-JSON cells (`{value, type, datatype?, "xml:lang"?}`) to their `.value` before the literal-unquoting pass. The daemon's `bindingValue` helper already handles both shapes; this library was assuming only the flat-string shape Oxigraph emits. Added `bareUri()` for URI bindings and switched `loadImportManifest` to use it for `?part`. 3. New integration-style tests in scripts/lib/__tests__/manifest.test.mjs cover: - full WM→SWM round-trip with flat bindings (Oxigraph shape) - same round-trip with SPARQL-JSON cell bindings (sparql-http shape) - latest-event-wins across multiple status updates on one partition A mock client tracks the WM/SWM split as two in-memory sets and serves SPARQL queries from SWM, exactly matching the daemon's promote contract. Confirmed all three tests FAIL when markPartitionStatus reverts to promoting only `evIri` — that's the regression coverage Codex asked for. 4. SKILL.md updated to document the dual-promotion behaviour so downstream importers / agents implementing this contract don't re-introduce the bug. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds §1.1 to dkg-importer/SKILL.md with the three hard caps an agent will hit at scale, plus the verbatim daemon error text for each one: - /write: MAX_BODY_BYTES = 10 MB body cap (HTTP 413) - /promote: SMALL_BODY_BYTES = 256 KB body cap (HTTP 413) — same status code as /write but a different limit, easy to confuse - /promote: 10 MB gossip-message cap (HTTP 500 "too large for gossip"), independent of the body cap and triggerable even with a single root URI Expands §5 Error handling so each of the three failure modes has its own recipe instead of one generic "halve and retry on 413" block. Calls out that 413-on-promote and 500-gossip are orthogonal recovery paths, both of which an rc.10 importer needs until #643's async promote queue lands. The cap numbers and error strings were validated empirically during the rc.10 Graphify import. Quoting them inline turns "discover via 413" into "look up before you start", which is the difference between a 5-minute import and a 20-minute import for non-trivial graphs. Co-authored-by: Cursor <cursoragent@cursor.com>
bb8b16e to
36e4c22
Compare
Five fixes from the round-4 review against the round-3 push: 1. SWM query routing (codex #642 L391, CRITICAL): the daemon's /api/query default routing for `subGraphName` alone reads the bare data graph, not the SWM tier. Since the manifest is created via the assertion API (create -> write -> promote), it only lives in SWM. Plumb `graphSuffix` (and `view`) through DkgClient.query, then have loadImportManifest pass `graphSuffix: '_shared_memory'`. Without this, a peer/restart resume would silently see zero manifest rows even on a healthy import. The mock client now routes by graphSuffix too, and a new regression test pins the call shape. 2. Monotonic statusEventUri suffix (codex L84, CRITICAL): `statusEventUri()` was `<ts>-<random>`. Two writes in the same millisecond therefore tied on both `recordedAt` AND a random token, so SPARQL's lexicographic tie-breaker picked a non-deterministic winner. New shape is `<ts>-<12-digit counter>-<random>` so within-process same-ms events sort by call order. Cross-process collisions still fall back to the random component, which is no worse than before. New regression test asserts strictly-increasing IRIs across 50 back-to-back calls. 3. Single-pass unquote() (CodeQL #90, L138): the previous four chained .replace() calls had a textbook double-unescape pitfall (`\\\\n` could collapse to a newline). Replaced with a single regex + replacer that matches each escape exactly once. New regression test round-trips literal `\n`, `\\`, `\"`, etc. through lit() / unquote() byte-for-byte. 4. importId sanitization for default assertion name (codex L222): `importUri()` / `partitionUri()` percent-encode any importId, but the daemon's `validateAssertionName` rejects `/`, whitespace, and other IRI-unsafe chars. The default assertion name now goes through a new exported `defaultManifestAssertionName()` that maps unsafe runs to `-`, length-clamps to the 256-char limit, and throws a descriptive error if nothing valid remains (instead of crashing /api/assertion/create with a cryptic 400). New regression tests cover sanitization, length cap, and the "all-unsafe" error case. 5. Python example fixes (codex L127): the README Python snippet was missing `import os` and referenced an undefined `port`. Added the import, defaulted port to `int(os.environ.get('DKG_PORT', '9200'))`, and used a `with` block for the token file. All 18 unit tests pass; 5 new tests cover each of the five fixes above. Co-authored-by: Cursor <cursoragent@cursor.com>
Round-4 codex review against the round-3 push surfaced five issues, all addressed inline: 1. CRITICAL: loadImportManifest queried the bare data graph instead of SWM, so resume from a peer/restart would silently see zero manifest rows on a healthy import. Plumb graphSuffix='_shared_memory' through DkgClient.query. 2. CRITICAL: same-millisecond statusEventUri suffixes used pure random; replaced with monotonic per-process counter so SPARQL tie-breaker matches call order. 3. CodeQL #90: unquote()'s chained .replace() had a textbook double- unescape pitfall; replaced with a single regex + replacer. 4. importId sanitization for the default assertion name (rejects /, whitespace, IRI-unsafe chars per validateAssertionName). 5. Python example: missing 'import os' + undefined 'port'. 5 new unit tests cover each fix; 18/18 pass. Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # packages/cli/skills/dkg-importer/SKILL.md # scripts/lib/__tests__/manifest.test.mjs # scripts/lib/manifest.mjs
| throw err; | ||
| } | ||
| } | ||
| const triples = buildInitialManifestTriples(importId, partitions, new Date().toISOString()); |
There was a problem hiding this comment.
🔴 Bug: retries/resumes for the same importId will append a fresh imp:startedAt every time, so one manifest can end up with multiple conflicting start timestamps. Because this helper explicitly treats already exists as success, that retry path is normal. Preserve the original start time on reuse (or make it caller-supplied/stable) instead of regenerating it here.
There was a problem hiding this comment.
Fixed in 59532a3. On assertion reuse, createImportManifest now detects the 409 and suppresses a fresh imp:startedAt while still healing the root/partition triples. Added a regression that reruns create against an existing assertion and asserts the original startedAt triple is unchanged.
| // `imp:initialStatus` and resume can't recover. Chunk by `ROOT_CHUNK` so | ||
| // very-large imports stay within the daemon's body-size budget — see | ||
| // ADR 0002. | ||
| const ROOT_CHUNK = 1000; |
There was a problem hiding this comment.
🔴 Bug: fixed ROOT_CHUNK = 1000 batches can still exceed the daemon's /promote 256 KB body cap when partition URIs are long (deep paths, percent-encoding, GitHub-style keys). That makes manifest creation fail on large imports even though the new skill documents 413-aware splitting as part of the contract. This promote loop should be size-aware or halve-and-retry on 413.
There was a problem hiding this comment.
Fixed in 59532a3. Promote now batches roots by estimated JSON body size under the daemon cap and recursively splits/retries on 413, so long percent-encoded partition URIs do not blow up a nominal 1000-root chunk. Added a mock 413 regression for long partition keys.
| partitions: ['only.ts'], | ||
| subGraphName: 'meta', | ||
| }); | ||
| // The mock records all `request()` calls; the create call should have |
There was a problem hiding this comment.
🟡 Issue: this regression test never asserts the sanitized assertion name it claims to cover. makeMockClient.request() drops the request body, so the test would still pass if createImportManifest sent the raw importId or omitted name entirely. Record the create payload in the mock and assert name === "import-manifest-corpus-2026-q1" so the regression is actually pinned.
There was a problem hiding this comment.
Fixed in 59532a3. The mock client now records request() payloads, and the sanitized-name regression asserts the create body sends import-manifest-corpus-2026-q1.
Preserve the original imp:startedAt triple when createImportManifest reuses an existing assertion, and batch manifest promote roots by estimated JSON body size with recursive 413 split/retry. Extend the manifest mock to record create request bodies and simulate promote payload limits, covering sanitized assertion names and long partition URI promote batches. Co-authored-by: Codex <codex@openai.com>
| const prefix = 'import-manifest-'; | ||
| const maxSlugLen = 256 - prefix.length; | ||
| const slug = importId | ||
| .replace(/[^A-Za-z0-9._-]+/g, '-') |
There was a problem hiding this comment.
🔴 Bug: This sanitization is no longer one-to-one. Different importIds such as a/b vs a b (and long IDs that only differ after the 240-char cutoff) both collapse to the same assertion name, so the later import will silently reuse the earlier manifest assertion and mix their progress/lifecycle state. Preserve uniqueness here, for example by appending a stable hash/encoded suffix whenever sanitization or truncation changes the original id.
| return `${partitionUri(importId, key)}/event/${ts}-${seq}-${rand}`; | ||
| } | ||
|
|
||
| function lit(value) { |
There was a problem hiding this comment.
🔴 Bug: The new literal codec only escapes \\, ", \n, and \r. Partition keys/status values containing other valid RDF control escapes such as tabs are emitted raw here, which can make the manifest write invalid or read back corrupted. Please reuse the existing shared literal helper or extend both the encode and decode paths to handle the full N-Triples escape set consistently.
| const evIri = statusEventUri(importId, partitionKey); | ||
| const nowIso = new Date().toISOString(); | ||
| const triples = [ | ||
| { subject: partIri, predicate: IMPORT_P.statusEvent, object: evIri }, |
There was a problem hiding this comment.
🔴 Bug: markPartitionStatus() never checks that partitionKey was declared in the manifest. If a caller passes a typo or a new key, this write/promote succeeds, but loadImportManifest() will never surface that status because it only walks <import> imp:partition ?part edges created by createImportManifest(). That leaves the real partition looking permanently pending with no obvious error. Please validate membership before writing, or backfill/promote the partition declaration when it is missing.
| createdFresh = false; | ||
| } | ||
| const importIri = importUri(importId); | ||
| const triples = buildInitialManifestTriples(importId, partitions, new Date().toISOString()) |
There was a problem hiding this comment.
🔴 Bug: on the already exists path we blindly append whatever partitions array the caller supplies into the existing assertion. Reusing an importId with a different partition set will silently produce the union of old and new partitions, so resume logic can keep retrying stale entries from a previous run. Consider loading the existing manifest on reuse and failing fast when the partition set differs, or make additive extension an explicit opt-in.
|
|
||
| P2P tools fail gracefully when the peer is offline. `dkg_publish` (fresh quads + write + publish, two HTTP calls) and `dkg_shared_memory_publish` (publish existing SWM, one HTTP call) differ in intent: use the two-call helper for "I have quads, publish now"; use the canonical finalizer as step 4 of the stepwise write → promote → publish flow. `dkg_share` is a direct SWM convenience helper for quick team-visible notes, not a replacement for assertion lifecycle tracking. | ||
|
|
||
| **Bulk imports (>5,000 quads in one logical operation):** the per-call `dkg_assertion_*` loop IS the chunked-write API; there is no `/api/import/bulk`. If you're pushing a code graph, a corpus, or any multi-thousand-quad import, read [`dkg-importer/SKILL.md`](../dkg-importer/SKILL.md) first — it codifies the chunking budgets, HTTP 413 recovery, resumability manifest, and canonical-URI rules so your importer converges with the rest of the workspace. |
There was a problem hiding this comment.
🟡 Issue: this cross-link is not reachable for installed users right now. The setup/install flows and the public skill endpoint still hardcode only skills/dkg-node/SKILL.md, so agents running via dkg mcp/openclaw/hermes setup will not have ../dkg-importer/SKILL.md available. Either ship the importer skill through those paths too, or keep the bulk-import guidance inline here until the extra skill is actually distributed.
| try { | ||
| await client.writeOne(slice); | ||
| } catch (err) { | ||
| if (err.statusCode !== 413) throw err; |
There was a problem hiding this comment.
🔴 Bug: These retry examples check err.statusCode, but scripts/lib/dkg-daemon.mjs sets err.status on failed requests. Copied verbatim, the 413 branch never runs and the sample rethrows instead of halving. Please switch this snippet (and the similar one below) to the actual error shape or document a wrapper that normalizes it.
| ### Connection errors / 5xx | ||
|
|
||
| Standard retry with exponential backoff. The daemon does not implement | ||
| idempotency tokens, but `assertion/create` returns 200 for both first and |
There was a problem hiding this comment.
🟡 Issue: The raw daemon route does not return 200 on duplicate POST /api/assertion/create; packages/cli/src/daemon/routes/assertion.ts maps already exists to HTTP 400. Only higher-level clients normalize that into an idempotent success. As written, this tells importer authors the wrong wire contract and can lead to broken retry handling.
| if (!subGraphName) throw new Error('markPartitionStatus requires `subGraphName`.'); | ||
|
|
||
| const cgId = client.cgId; | ||
| const assertion = assertionName ?? defaultManifestAssertionName(importId); |
There was a problem hiding this comment.
🔴 Bug: markPartitionStatus() recomputes the assertion name from importId on every write. If the manifest was originally created with a custom assertionName, a resumed importer that only knows importId will target a different assertion here and /api/assertion/:name/write will fail with not found. Because loadImportManifest() doesn't persist or return the actual assertion name, callers have no way to recover it later. Please either persist the chosen assertion name in the manifest and resolve it here, or remove the public assertionName override so the name is always derivable.
|
|
||
| const cgId = client.cgId; | ||
| const assertion = assertionName ?? defaultManifestAssertionName(importId); | ||
| const manifest = await loadImportManifest({ client, importId, subGraphName }); |
There was a problem hiding this comment.
🟡 Issue: this validates the partition key by loading and sorting the entire manifest on every status update. For the bulk imports this helper targets, that makes status tracking quadratic (in_progress + done across 10k partitions means 20k full SWM queries). Consider caching the declared partition set from createImportManifest()/loadImportManifest(), or replacing this with a narrower existence check / optional validation flag.
… via HTTP Two PR #642 follow-ups + the originally-scheduled SKILL.md update for the async promote queue (the bulk of PR #4's documentation work, deferred until #642 merged because that's where dkg-importer/SKILL.md first landed). 1. packages/cli/skills/dkg-importer/SKILL.md — new §6 "Async promote queue" - Route inventory (POST/GET/DELETE /api/assertion/.../promote-async + recover) - Async write loop (drop-in for §2's synchronous promote step) - Failure-classification table mapping attempt.lastError.classification (transient | cap_exceeded | fatal) to importer actions - Migration guidance vs. the sync /promote route (still supported) - Inspecting in-flight jobs alongside the manifest in §3 New cheat-sheet variant under §8 mirroring the async loop. §5's forward reference to "PR #643's async promote queue ... until that lands" rewritten to point at the now-shipping §6. 2. New endpoint GET /.well-known/skill-importer.md (Codex PR #642 follow-up) The cross-link from dkg-node/SKILL.md to dkg-importer/SKILL.md was unreachable for agents installed via the setup flow because only the former was served from packages/cli/src/daemon/manifest.ts's loadSkillTemplate. Adds a sibling loadImporterSkillTemplate + /.well-known/skill-importer.md route in status.ts (same auth-public, ETag-cacheable shape as /.well-known/skill.md), plus PUBLIC_GET_PATHS / PUBLIC_HEAD_PATHS / isLoopbackRateLimitExemptPath allowlist entries. Cross-link in dkg-node/SKILL.md now points at the runtime URL. 4 new tests (auth public path + 4 dkg-importer skill content pins). 3. scripts/lib/manifest.mjs — partitionDeclared helper (Codex PR #642 quadratic-validation follow-up). markPartitionStatus used to call loadImportManifest on every status write, materialising the full manifest from SWM. For a 10k-partition import marking each twice (in_progress + done), that's 20k full SWM round-trips. Replaced with a single-row SPARQL ASK: one query, one Boolean, same error semantics when a partition isn't declared. 4 new tests including a regression pin asserting markPartitionStatus issues exactly one query. Tests: 27/27 (scripts/lib manifest), 32/32 (publisher async-promote-queue), 190/190 (CLI unit suites incl. new skill-endpoint cases, async-promote routes/worker/E2E, auth allowlist). tsc green. Zero lints. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Implements the resumable-import pattern from ADR 0002 (#641) and gives every agent / human importer a single canonical reference for chunked bulk writes against a DKG node.
`scripts/lib/manifest.mjs` — Resumable-import manifest helpers
`scripts/lib/tests/manifest.test.mjs` — Six unit tests for pure helpers (URI encoding, ontology constants, triple building, status filtering). Daemon-roundtrip coverage stays in the repro suite.
`packages/cli/skills/dkg-importer/SKILL.md` (NEW) — Agent-readable manual codifying:
`packages/cli/skills/dkg-node/SKILL.md` — One-line cross-reference under the assertion-write tool table, pointing agents to `dkg-importer/SKILL.md` when pushing >5k quads in one go.
Test plan
Stack
This PR is stacked on #641 (ADRs). Once #641 lands, this rebases against main.
Related: #596, #636, #640.
Made with Cursor
Update — round 3 (2026-05-25): known caps + manifest hardening
dkg-importer/SKILL.md— Known caps with exact error strings (§1.1)Adds a new §1.1 reference table calling out the three independent daemon caps an importer must handle, with the verbatim daemon error text for each:
POST /api/assertion/<name>/writeMAX_BODY_BYTESHTTP 413"Request body too large (>10485760 bytes)"POST /api/assertion/<name>/promoteSMALL_BODY_BYTESHTTP 413"Request body too large (>262144 bytes)"POST /api/assertion/<name>/promoteHTTP 500"Promoted assertion too large for gossip (XXXX KB, limit 10 MB)"The two
/promotecaps are independent — same status code (413) for the body cap, but a different limit from/write's — and the 10 MB gossip cap can trip even with a single root URI if that root's transitive triple set is large enough. §5 (Error handling) is rewritten so each failure mode has its own recipe instead of one generic "halve on 413" block.Numbers and error strings validated empirically during the rc.10 Graphify import. Quoting them inline turns "discover via 413" into "look up before you start", which is the difference between a 5-minute import and a 20-minute import for non-trivial graphs.
scripts/lib/manifest.mjs— round-3 hardeningThree round-3 codex line comments fixed (each with a new unit test, 13/13 pass):
createImportManifestnot idempotent onalready exists: now catches the daemon's "already exists" error from/api/assertion/createand treats it as successful reuse, so a retry / second-process resume of the same import doesn't crash before reading progress.?ts2 = ?recordedAt && STR(?ev2) > STR(?ev)) so two writes within one ms produce a deterministic ordering instead of nondeterministic status selection.loadImportManifestnow throws loudly when the SPARQL returns zero bindings instead of silently returning{partitions: []}(which previously made resume callers think the import was complete when the manifest was actually missing / not yet visible in SWM).Update — round 4 (2026-05-25 22:50): post-empirical-motivation Codex review
Round-4 review against the round-3 push surfaced five issues — three real bugs that would have shipped, one CodeQL finding, and one runnable-example fix. All addressed inline with new unit-test coverage (18/18 pass).
CRITICAL —
loadImportManifestwas reading the bare data graph, not SWM (L391)/api/querywithsubGraphNamealone routes to the bare data graphdid:dkg:context-graph:<cg>/<sub>, NOT the SWM tier…/_shared_memory. Since the manifest is created via the assertion API (create → write → promote), it ONLY exists in SWM. So a peer/restart resume callingloadImportManifestwould silently see zero bindings even on a healthy import — and the §L375 "loud failure" fix would then throw "manifest is missing" for a manifest that actually exists.Fix: plumb
graphSuffix(andviewfor completeness) throughDkgClient.query.loadImportManifestnow passesgraphSuffix: '_shared_memory'so the query hits the right graph. The mock client in tests now also routes bygraphSuffix, so a regression where the SWM hint is dropped fails the test suite immediately. A new dedicated test pins the call shape.CRITICAL —
statusEventUrisame-millisecond ordering was non-deterministic (L84)The previous shape
<ts>-<6-char random>meant two events in the same millisecond would tie onrecordedAtAND tie on a random token, leaving SPARQL's lexicographic tie-breaker non-deterministic.Fix: new shape
<ts>-<12-digit monotonic counter>-<6-char random>. The counter is process-local and zero-padded so lexicographic ordering matches call order for same-process same-ms events. Cross-process same-ms collisions fall back to the random component — no worse than before. New regression test asserts strictly-increasing IRIs across 50 back-to-back calls.CodeQL #90 — double-unescape risk in
unquote(L138)The previous chain of four
.replace()passes had a textbook double-unescape pitfall:\\\\n(escaped backslash + literaln) could collapse to a newline because the second pass would re-interpret the\left behind by the first pass.Fix: single-pass regex
.replace(/\\(["\\nr])/g, replacer)so each escape sequence matches exactly once. New regression test round-trips literal\n,\\,\", mixed strings throughlit() / unquote()byte-for-byte.importIdsanitization for default assertion name (L222)importUri()/partitionUri()percent-encode anyimportId, but the daemon'svalidateAssertionNamerejects/, whitespace, and other IRI-unsafe characters. A caller passingimportId="corpus/2026-q1"would get valid URIs back but/api/assertion/createwould 400.Fix: new exported
defaultManifestAssertionName()that maps unsafe character runs to-, length-clamps to the 256-char daemon limit, and throws a descriptive error if nothing valid remains (instead of crashing later with a cryptic 400). Two regression tests: parameterised sanitization assertions + an end-to-end "create/load with an unsafe importId works" check.Python sample fix (L127)
The Python snippet in
dkg-importer/SKILL.mdwas missingimport osand used an undefinedport. Fixed: added the import, defaulted port toint(os.environ.get('DKG_PORT', '9200')), switched to awithblock for the token file.Test coverage
5 new tests added (18/18 total pass) — one per fix, all targeted at the exact failure mode codex flagged, so a future regression on any of these would fail CI immediately.