Skip to content

docs: state cluster apply is storage-direct, not server-routed#306

Merged
ragnorc merged 4 commits into
mainfrom
docs/cluster-apply-storage-direct
Jun 28, 2026
Merged

docs: state cluster apply is storage-direct, not server-routed#306
ragnorc merged 4 commits into
mainfrom
docs/cluster-apply-storage-direct

Conversation

@aaltshuler

@aaltshuler aaltshuler commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

What

Makes explicit — in both the operator guide and the agent skill — that omnigraph cluster apply is a storage-direct control-plane command: it reaches the object store directly (the __cluster/ ledger and each graph's Lance datasets, to create/migrate/delete them), never through a running omnigraph-server. The host that runs it (operator or CI) therefore needs storage access — the AWS_* credential contract for an s3:// cluster.

Why

This is a real doc gap, not a behavior change. The reasoning was already documented in docs/dev/cluster-axioms.md §3 (no runtime mutation API on the running system) and §4 (config lives outside the running system). The out-of-band / direct-storage fact was stated explicitly for the maintenance verbs (optimize/repair/cleanup, §7) and for init/load — but never spelled out for apply itself. A reader could reasonably assume apply routes through the server. It does not, and the credential implication matters operationally.

Changes

  • docs/user/clusters/index.md (§2, the day-2 loop) — adds a note that apply runs out-of-band with direct storage access, never via the server, and the host needs storage credentials; links the why to cluster-axioms §3/§4, and notes the server only ever reads the converged ledger (so a held apply lock never blocks serving).
  • skills/omnigraph/SKILL.md (Storage & Credentials) — extends the existing "init and load write storage directly (bypassing the server)" line to include cluster apply, with the declarative-not-runtime-API rationale.

Docs-only; no code or behavior change.

🤖 Generated with Claude Code

Greptile Summary

This PR clarifies how omnigraph cluster apply reaches storage. The main changes are:

  • Adds operator-guide wording for storage-direct cluster apply.
  • Explains the S3 credential requirement for the host running apply.
  • Updates the omnigraph skill's storage and credential guidance.

Confidence Score: 4/5

This is close, but the skill wording should be fixed before merging.

  • The updated skill still says load bypasses the server, while server-targeted loads use the HTTP write path.
  • That can lead an agent to validate or provision the wrong host for write storage access.

skills/omnigraph/SKILL.md

Important Files Changed

Filename Overview
docs/user/clusters/index.md Adds operator-facing guidance that cluster apply runs out of band with direct object-store access.
skills/omnigraph/SKILL.md Updates the storage credential guidance, but the load routing sentence still needs to distinguish direct storage from server-routed loads.

Fix All in Claude Code

Reviews (4): Last reviewed commit: "Merge branch 'main' into docs/cluster-ap..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Context used - AGENTS.md (source)
  • Context used - CLAUDE.md (source)

`cluster apply` reaches the object store directly — the `__cluster/` ledger
and each graph's Lance datasets — never through a running omnigraph-server,
so the host that runs it needs storage credentials. The rationale (declarative
control plane, not a runtime mutation API) was documented in cluster-axioms.md
§3/§4, and the out-of-band/direct-storage fact was stated for the maintenance
verbs and init/load, but never spelled out for apply itself.

- docs/user/clusters/index.md: add a day-2 note making apply's storage-direct
  execution and credential requirement explicit, linking the why to axioms 3/4.
- skills/omnigraph/SKILL.md: extend the "init/load write storage directly
  (bypassing the server)" line to include cluster apply, with the same reasoning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread docs/user/clusters/index.md Outdated
The trailing (§5) sat right after the cluster-axioms.md §3/§4 citation, so a
reader could read §5 as referring to cluster-axioms.md (whose §5 covers locked
state) rather than this guide's §5. Make it an explicit same-page forward
reference. Addresses Greptile P2 on #306.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread skills/omnigraph/SKILL.md Outdated
aaltshuler and others added 2 commits June 25, 2026 17:13
The "server only reads from it" wording was wrong: the data plane serves HTTP
writes (mutate/load/branch) that go through the server to the graph datasets,
so omnigraph-server is not read-only against object storage. The hazard is an
operator granting the server read-only S3 creds and breaking runtime writes.
Scope the read-only claim to cluster (control-plane) state at boot, and state
that data-plane writes still need read-write storage access. Addresses Greptile
P-level finding on #306.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ragnorc ragnorc merged commit 20e5fad into main Jun 28, 2026
7 checks passed
Comment thread skills/omnigraph/SKILL.md
```

`init` and `load` write storage directly (bypassing the server); the server reads from it. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`.
`init`, `load`, and **`cluster apply`** write storage directly (bypassing the server). `cluster apply` is a storage-direct control-plane command — it reaches the object store directly (the `__cluster/` ledger *and* each graph's Lance datasets, to create/migrate/delete them), never through a running server, so the host that runs it needs storage access (the `AWS_*` contract for an `s3://` cluster). That is by design: the control plane is declarative (config → cluster), not a runtime mutation API on the serving process. The server reads **cluster** state read-only at boot, but it is not read-only against storage overall — data-plane HTTP writes (`mutate`/`load`/`branch`) still go through the server to the graph datasets, so it needs read-write storage access. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Clarify load routing

This sentence still says load writes storage directly and bypasses the server, but server-targeted loads use the server's HTTP write path. When an agent follows this skill for omnigraph load --server ..., it can provision or validate only the CLI host for storage writes even though the server performs the Lance write and needs read-write storage access. The later sentence about HTTP load going through the server helps, but the opening sentence still gives the opposite rule for the same command.

Suggested change
`init`, `load`, and **`cluster apply`** write storage directly (bypassing the server). `cluster apply` is a storage-direct control-plane command — it reaches the object store directly (the `__cluster/` ledger *and* each graph's Lance datasets, to create/migrate/delete them), never through a running server, so the host that runs it needs storage access (the `AWS_*` contract for an `s3://` cluster). That is by design: the control plane is declarative (config → cluster), not a runtime mutation API on the serving process. The server reads **cluster** state read-only at boot, but it is not read-only against storage overall — data-plane HTTP writes (`mutate`/`load`/`branch`) still go through the server to the graph datasets, so it needs read-write storage access. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`.
`init`, direct-storage `load`, and **`cluster apply`** write storage directly (bypassing the server). When `load` is run through a configured server, it uses the server's data-plane HTTP path instead. `cluster apply` is a storage-direct control-plane command — it reaches the object store directly (the `__cluster/` ledger *and* each graph's Lance datasets, to create/migrate/delete them), never through a running server, so the host that runs it needs storage access (the `AWS_*` contract for an `s3://` cluster). That is by design: the control plane is declarative (config → cluster), not a runtime mutation API on the serving process. The server reads **cluster** state read-only at boot, but it is not read-only against storage overall — data-plane HTTP writes (`mutate`/`load`/`branch`) still go through the server to the graph datasets, so it needs read-write storage access. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`.

Context Used: AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants