Skip to content

OT-RFC-38 LU-6 C4 — two-laptop testnet validation runbook#625

Closed
branarakic wants to merge 1 commit into
feat/ot-rfc-38-lu6-host-modefrom
feat/lu6-followup-c4-testnet-deployment-runbook
Closed

OT-RFC-38 LU-6 C4 — two-laptop testnet validation runbook#625
branarakic wants to merge 1 commit into
feat/ot-rfc-38-lu6-host-modefrom
feat/lu6-followup-c4-testnet-deployment-runbook

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Summary

The local devnet harnesses (`scripts/devnet-test-rfc38-*.sh`) all use a libp2p-private mesh — loopback dialing, no DHT, no NAT traversal. They do not cover the failure modes that only surface when peers traverse real internet hops, which is the last remaining gate before declaring LU-6 mainnet-ready.

This runbook is the C4 companion: an end-to-end checklist for operating two laptops on different NATs + a core operator's existing testnet node, exercising the full LU-6 lifecycle on real network conditions.

Stacked on #610. Pure documentation, no production-code changes.

What the runbook covers

  • Curated CG creation + on-chain registration on Base Sepolia
  • Discovery-beacon propagation across the public mesh
  • Cross-NAT/DHT gossip delivery of opaque SWM ciphertext to a core
  • Host-catchup wire protocol over real RTT
  • Signature-based host-catchup authorization (B1, OT-RFC-38 LU-6 B1 — signed swm-host-catchup requests #618)
  • Member catchup resume across NAT/connection churn (B-series)
  • VM publish from edge nodes with deferred on-chain registration
  • Cross-laptop attestation cross-verification
  • Stress + unclean-restart scenarios over the public mesh

Explicitly out of scope

Test plan

  • User-led; run after the LU-6 stack lands on testnet
  • Report back results inline on this PR or in a follow-up

Made with Cursor

The local devnet harnesses (`scripts/devnet-test-rfc38-*.sh`) all
use a libp2p-private mesh — loopback dialing, no DHT, no NAT
traversal. They do not cover the failure modes that only surface
when peers traverse real internet hops, which is the LAST remaining
gate before declaring LU-6 mainnet-ready.

This runbook is the C4 companion: an end-to-end checklist for
operating two laptops on different NATs + a core operator's existing
testnet node, exercising the full LU-6 lifecycle on real network
conditions:

- Curated CG creation + on-chain registration on Base Sepolia
- Discovery-beacon propagation across the public mesh
- Cross-NAT/DHT gossip delivery of opaque SWM ciphertext to a core
- Host-catchup wire protocol over real RTT
- Signature-based host-catchup authorization (B1, #618)
- Member catchup resume across NAT/connection churn (B-series)
- VM publish from edge nodes with deferred on-chain registration
- Cross-laptop attestation cross-verification
- Stress + unclean-restart scenarios over the public mesh

Explicitly out of scope:
- LU-11 chunked ciphertext commitment (still in design, #617)
- RFC-39 random sampling (depends on LU-11)
- Multi-million-message scale (use the synthetic harnesses)

User-led (requires two real laptops on separate networks).

Co-authored-by: Cursor <cursoragent@cursor.com>

Capture the three agent addresses + libp2p peer IDs:
```bash
dkg show
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: dkg show is not a top-level CLI command in this repo, and later references to dkg show-cg / dkg shared-memory host-mode stats also don't exist. An operator following the runbook will fail before step 1. Replace these with existing surfaces such as dkg status, dkg wallet, GET /api/agent/identity, dkg context-graph info, and GET /api/shared-memory/host-mode/stats.


Verify on laptop A:
```bash
curl -sH "Authorization: Bearer $(cat ~/.dkg/auth.token)" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: ~/.dkg/auth.token is a commented file by default, so $(cat ~/.dkg/auth.token) injects both the comment and the token into the Authorization header. These curl examples can 401 even on a healthy node. Use dkg auth show or strip comments/blank lines before interpolating the token.

Verify on laptop A:
```bash
curl -sH "Authorization: Bearer $(cat ~/.dkg/auth.token)" \
http://localhost:9200/api/context-graph/list | jq '.[] | select(.access=="curated")'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: /api/context-graph/list returns an object with a contextGraphs array, not a bare array, and the items expose accessPolicy, not access. This jq filter will never match, so the verification step gives a false negative. Query .contextGraphs[] and filter on id/accessPolicy instead.

```bash
curl -sH "Authorization: Bearer $(cat ~/.dkg/auth.token)" \
-H 'Content-Type: application/json' \
-d '{ "contextGraphId": "<cg-id>" }' \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: omitting peerId means /api/shared-memory/catchup fans out to whatever peers happen to be connected. In this topology that may hit laptop A directly or no peer at all, so it does not reliably validate the intended 'via the core' host-catchup path. Pass the core's peerId, or use /api/shared-memory/host-catchup if the goal is to exercise hosted ciphertext replay specifically.

List the local triples:
```bash
curl -sH "Authorization: Bearer $(cat ~/.dkg/auth.token)" \
"http://localhost:9200/api/shared-memory/list?contextGraphId=$(printf %s '<cg-id>' | jq -sRr @uri)" | jq '.triples | length'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: there is no GET /api/shared-memory/list route in the daemon, so this verification command will 404. To prove the member received data, either inspect totalInsertedTriples from the catchup response or issue /api/query against the CG's _shared_memory graph.

@branarakic
Copy link
Copy Markdown
Contributor Author

Superseded by PR #649 (release: rc.10 testnet-ready cut). All commits from this PR are now on main via #649. Unaddressed Codex review feedback (C4 runbook: command names, auth.token handling, jq paths) is being tracked + fixed in a dedicated post-rc.10 followup PR.

@branarakic branarakic closed this May 25, 2026
branarakic pushed a commit that referenced this pull request May 25, 2026
…ANGELOG fix

PR #625's runbook documented HTTP and CLI surfaces that don't exist on
the rc.10 daemon — operators following it from step 1 would 404 / 401
their way through every section. PR #638's LU-8 CHANGELOG entry promised
a server-side reconstruction path for `verify-batch` that the actual
route refuses. Both are doc bugs only, no code change.

docs/RFC38_LU6_TWO_LAPTOP_TESTNET_RUNBOOK.md:
 - Replaces `dkg show` (never a real top-level command) with the
   actual surface: `dkg status` (peerId/multiaddrs/role), `dkg auth
   show` (token, stripped of comments), and `GET /api/agent/identity`
   for the agent EOA. Same for `dkg show-cg` (compute the wire id as
   `keccak256(<cgId>)` since there's no CLI shortcut) and `dkg
   shared-memory host-mode stats` (use `GET /api/shared-memory/
   host-mode/stats` directly).
 - Every `$(cat ~/.dkg/auth.token)` interpolation now uses `dkg auth
   show` instead. The token file has a commented-header preamble by
   default, so the literal `cat` would inject `#\n<token>` into the
   `Authorization` header and 401 even on a healthy node. `dkg auth
   show` strips comments + blank lines, matching what
   `packages/node-ui/vite.config.ts` does for the same reason.
 - `/api/context-graph/list` filter corrected — the response is
   `{ contextGraphs: [...] }` (envelope, not bare array) and items
   expose `accessPolicy` (numeric: 0=public, 1=curated), not `access`.
   The old `jq '.[] | select(.access=="curated")'` filter would have
   silently matched nothing.
 - Member catchup now pins `peerId` to the core's libp2p identity.
   Without `peerId`, `/api/shared-memory/catchup` fans out to every
   connected peer — which in a two-laptop+core topology can hit
   laptop A directly or no peer at all, and won't reliably validate
   the "via the core" host-catchup path the runbook claims to verify.
   Showed how to grab the peer id from `/api/status` first.
 - Replaced the `/api/shared-memory/list?contextGraphId=...` curl
   (404, no such route) with a SPARQL `SELECT (COUNT(*) AS ?n)` via
   `/api/query` against the `_shared_memory` graph suffix — the same
   shape every devnet script uses for the equivalent assertion.

CHANGELOG.md:
 - LU-8 entry no longer claims "when `quads` is omitted the route
   reconstructs from the local SWM or post-publish CG data graph".
   That hasn't been true since the safety hardening in the published
   route (`packages/cli/src/daemon/routes/memory.ts:1042-1049`):
   `quads` is **required**, HTTP 400 on omission, because the daemon
   can't safely identify a single published batch's leaves inside a
   CG-wide store. Devnet scenario `scripts/devnet-test-rfc38-lu8.sh`
   SCENARIO 1 pins this contract with a HTTP-400 assertion. The
   CHANGELOG now matches the route's actual behaviour + rationale.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant