Skip to content

integration: validate tunnel onboarding with live OBOL faucet flow#452

Merged
bussyjd merged 15 commits into
mainfrom
integration/pr450-pr451-cloudflare-obol
May 9, 2026
Merged

integration: validate tunnel onboarding with live OBOL faucet flow#452
bussyjd merged 15 commits into
mainfrom
integration/pr450-pr451-cloudflare-obol

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 8, 2026

Summary

What changed:

  • hardens the integrated tunnel/live-OBOL branch’s smoke and live-flow harnesses after the Docker reinstall
  • prefers obol-native workspace teardown (stack down / stack purge) and preserves cached stack IDs so k3d fallback cleanup still works when purge removes config
  • makes the public tunnel verification fail closed instead of silently passing when the tunnel probe is unreachable
  • keeps the flow-13 eRPC pinning path dependency-free from PyYAML and now fails clearly if ruby is unavailable
  • keeps the spark2-backed gemma4-fast path green across the integrated smoke, live Base Sepolia OBOL, and forked-OBOL flows

Why it matters:

  • exact-HEAD 13ba63c17b701fafe42606501125e309768da9bb is now green for the full smoke suite, including flow-11, flow-14, and flow-13
  • the validation story is now aligned with the pushed commit instead of a pre-commit worktree run
  • repeated QA reruns stop leaking stale workspaces/clusters from the stack-id cleanup path that regressed during the Docker reset cycle

Risk level: medium

Commit under test:

  • 13ba63c17b701fafe42606501125e309768da9bb

Base branch:

  • main

Scope

  • Code
  • Charts / manifests
  • Flows / QA scripts
  • Docs / skills
  • Images / dependencies
  • Other: cleanup / validation harness hardening for the integrated PR branch

Validation

CI checks:

Check Status Link
lint-test PASS https://github.com/ObolNetwork/obol-stack/actions/runs/25597706304/job/75146356680
Analyze (actions) PASS https://github.com/ObolNetwork/obol-stack/actions/runs/25597705705/job/75146356004
Analyze (go) PASS https://github.com/ObolNetwork/obol-stack/actions/runs/25597705705/job/75146356006
Analyze (javascript-typescript) PASS https://github.com/ObolNetwork/obol-stack/actions/runs/25597705705/job/75146356008
Analyze (python) PASS https://github.com/ObolNetwork/obol-stack/actions/runs/25597705705/job/75146356005

Pre-commit / local correctness checks:

commit=13ba63c17b701fafe42606501125e309768da9bb
$ bash -n flows/lib.sh flows/flow-07-sell-verify.sh flows/flow-10-anvil-facilitator.sh flows/flow-11-dual-stack.sh flows/flow-13-dual-stack-obol.sh flows/flow-14-live-obol-base-sepolia.sh
PASS

$ python3 -m py_compile internal/embed/skills/discovery/scripts/discovery.py
PASS

$ git diff --check
PASS

$ independent diff review (cleanup + fail-closed tunnel checks)
PASS

Exact-head release smoke:

commit=13ba63c17b701fafe42606501125e309768da9bb
$ RELEASE_SMOKE_INCLUDE_OBOL=true \
  RELEASE_SMOKE_INCLUDE_OBOL_FORK=true \
  FLOW13_BASE_SEPOLIA_RPC=https://base-sepolia-rpc.publicnode.com \
  OBOL_LLM_ENDPOINT=http://192.168.0.24:18080/v1 \
  OBOL_LLM_MODEL=gemma4-fast \
  DOCKER_CONFIG=<temporary empty config.json> \
  bash flows/release-smoke.sh
PASS

Inline smoke report summary:

Flow Result FAIL lines SKIP lines Exit code
flow-01-prerequisites PASS 0 0 0
flow-02-stack-init-up PASS 0 0 0
flow-03-inference PASS 0 0 0
flow-04-agent PASS 0 0 0
flow-05-network PASS 0 0 0
flow-06-sell-setup PASS 0 0 0
flow-07-sell-verify PASS 0 0 0
flow-10-anvil-facilitator PASS 0 0 0
flow-08-buy PASS 0 0 0
flow-09-lifecycle PASS 0 0 0
flow-11-dual-stack PASS 0 0 0
flow-14-live-obol-base-sepolia PASS 0 0 0
flow-13-dual-stack-obol PASS 0 0 0

Artifacts from the exact-head run:

  • report: .tmp/release-smoke-20260509-165243/RELEASE_REPORT.md
  • flow-11 receipts: .tmp/release-smoke-20260509-165243/flow-11-receipts/receipt-summary.json
  • flow-14 receipts: .tmp/release-smoke-20260509-165243/flow-14-receipts/receipt-summary.json
  • flow-13 receipts: .tmp/release-smoke-20260509-165243/flow-13-receipts/receipt-summary.json

Live Chain Evidence

Network:

  • Base Sepolia (84532)

RPC/provider:

  • https://base-sepolia-rpc.publicnode.com

Facilitator:

  • public: https://x402.gcp.obol.tech
  • local fork flow: http://127.0.0.1:53788

Contracts and tokens:

Name Address Notes
ERC-8004 Identity Registry 0x8004A818BFB912233c491871b3d84c89A494BD9e Base Sepolia registry
Live OBOL token 0x0a09371a8b011d5110656ceBCc70603e53FD2c78 Obol Network / OBOL / 18 decimals
Forked OBOL token 0x210BBd033630e5e611B7922D70b0Caabe64636d9 deployed during exact-head flow-13
Permit2 router 0x000000000022D473030F116dDEE9F6B43aC78BA3 approval target used in live/fork OBOL flows

Wallet roles:

Role Address
Alice / seller / register 0xC0De030F6C37f490594F93fB99e2756703c4297E
Bob / buyer / payer / signer 0x57b0eF875DeB5A37301F1640E469a2129Da9490E

Exact-head transaction evidence:

flow-11-dual-stack exact-head evidence

  • agent id: 5702
  • tunnel: https://pottery-arms-horses-tall.trycloudflare.com
  • registration tx: 0x844bb9d8179571aca3f53fd95b5ba33cd4c972538c84a54138cbfdf0ee37604c
  • metadata tx: 0xc2a9a72bed2d7cd8311839b4c803e4d950a89704997af017bd76cdbdc774f48d
  • settlement tx: 0x651c44cab864ffed001a3fb089a1198fff7b4e04c1093fe7f4ee86fcf5a6ad71

flow-14-live-obol-base-sepolia exact-head evidence

  • agent id: 5703
  • tunnel: https://statute-allen-leaf-runs.trycloudflare.com
  • OBOL token: 0x0a09371a8b011d5110656ceBCc70603e53FD2c78
  • registration tx: 0xd183bb1ecd2993b87afe72e47e266b5b98f34091dc30d73c061a3d6e30917ee1
  • metadata tx: 0xa919f4b20b9fcfc0b00efd3b3d0c406bbf44ce7066db0489145f5ecf83d43b4f
  • settlement tx: 0xa192904a6c415b30cf908de500ff8c8330724b14601cbb9112181a2146deb576
  • Alice OBOL delta: 7000000000000000 -> 8000000000000000 wei (+1000000000000000)
  • Bob signer OBOL delta: 4993000000000000000 -> 4992000000000000000 wei (-1000000000000000)

flow-13-dual-stack-obol exact-head evidence

  • tunnel: https://catering-solid-night-several.trycloudflare.com
  • forked OBOL funding tx: 0xef2d85e801191599dec7ed3790bc74dd7b1c1f9c7f4f63c80b36e01254334582
  • forked OBOL settlement tx: 0x4e476edc29b0576aff44c48ca39889a9bffa38fc626d7087f00e8ff9637cf8b7
  • Alice OBOL delta: 10000000000000000000 -> 10001000000000000000 wei (+1000000000000000)
  • Bob signer OBOL delta: 10000000000000000000 -> 9999000000000000000 wei (-1000000000000000)

Runtime Evidence

QA environment:

Item Value
OS / arch macOS arm64
Backend k3d / k3s on Docker Desktop
Tooling Python 3.11.14, Go 1.25.5, GitHub CLI 2.86.0, Docker 29.4.2, kubectl client v1.35.3
QA model route external LiteLLM endpoint backed by spark2 gemma4-fast
Resilience detail exact-head smoke used an auto-restarting local SSH forward through spark1 -> 192.168.100.11:8000 so the spark2 endpoint stayed available across Cloudflare SSH flap events

Model and paid-route evidence:

  • paid/gemma4-fast returned HTTP 200 with coherent content in flow-11, flow-14, and flow-13
  • live OBOL flow settled exactly 1000000000000000 wei (0.001 OBOL) from Bob signer to Alice seller
  • forked OBOL flow settled exactly 1000000000000000 wei (0.001 OBOL) from Bob signer to Alice seller

Post-run cleanup state:

  • QA k3d clusters: none left running
  • local anvil processes from flow-13: stopped by the flow
  • local facilitator container from the smoke harness was manually removed after the run (docker rm -f obol-flow10-x402-facilitator) and is called out below as a remaining cleanup follow-up

Review Notes

Known gaps:

  • flow-10 / smoke cleanup still left a helper container (obol-flow10-x402-facilitator) after the exact-head run; I removed it manually after validation. This PR materially improves stale-workspace / cluster cleanup, but there is still one remaining facilitator-container cleanup follow-up.
  • shellcheck still reports several pre-existing warnings in the flow harness outside the paths touched here.

Reviewer focus:

  • flows/lib.sh cached stack-id cleanup path
  • flows/flow-07-sell-verify.sh fail-closed public tunnel eRPC verification
  • flows/flow-13-dual-stack-obol.sh explicit runtime prereq handling for the YAML patch helper
  • exact-head smoke report and the three receipt summaries under .tmp/release-smoke-20260509-165243/

@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 9, 2026

Exact-head validation is now aligned with the pushed commit 13ba63c17b701fafe42606501125e309768da9bb.

Local validation summary:

  • full release-smoke PASS at exact HEAD
  • flow-11-dual-stack PASS
  • flow-14-live-obol-base-sepolia PASS
  • flow-13-dual-stack-obol PASS

The PR body now includes the exact-head smoke matrix and the live/fork OBOL receipt hashes from .tmp/release-smoke-20260509-165243/.

One remaining cleanup follow-up I observed during the exact-head run: the smoke harness still left obol-flow10-x402-facilitator running after completion, so I removed it manually after validation. CI analysis jobs are still pending on the newly pushed commit.

1 similar comment
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 9, 2026

Exact-head validation is now aligned with the pushed commit 13ba63c17b701fafe42606501125e309768da9bb.

Local validation summary:

  • full release-smoke PASS at exact HEAD
  • flow-11-dual-stack PASS
  • flow-14-live-obol-base-sepolia PASS
  • flow-13-dual-stack-obol PASS

The PR body now includes the exact-head smoke matrix and the live/fork OBOL receipt hashes from .tmp/release-smoke-20260509-165243/.

One remaining cleanup follow-up I observed during the exact-head run: the smoke harness still left obol-flow10-x402-facilitator running after completion, so I removed it manually after validation. CI analysis jobs are still pending on the newly pushed commit.

@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 9, 2026

Final status:

  • exact-head local smoke is green on 13ba63c17b701fafe42606501125e309768da9bb
  • all GitHub PR checks are now passing
  • PR body has been refreshed with the exact-head smoke matrix and current receipt hashes

Remaining noted follow-up: the smoke harness still left obol-flow10-x402-facilitator running after the exact-head run, so I removed it manually after validation and called that out in the PR body.

bussyjd and others added 2 commits May 9, 2026 19:32
Co-authored-by: bussyjd <bussyjd@users.noreply.github.com>
* feat(buy): add `obol buy inference` host CLI

Mirrors `obol sell inference` on the buyer side. The host CLI handles
default-seller resolution, ERC-8004 identity pre-flight, and USDC->micro-units
conversion, then dispatches to the existing `buy.py buy` skill in the
obol-agent pod. Single canonical wallet, no host-side keystore.

- internal/x402/setup.go: DefaultBuySellerURL, DefaultBuySellerAgentID,
  DefaultBuySellerChain placeholders (TODO: wire live values once the
  default seller is provisioned).
- internal/agentruntime/exec.go: ExecInPod + BuildExecArgs generalize the
  kubectl-exec helper that was hardcoded to the hermes binary.
- internal/hermes/hermes.go: cliViaKubectlExec + hermesExecArgs delegate to
  the new agentruntime helpers; existing test stays valid.
- internal/buy/discover.go: .well-known/agent-registration.json fetcher
  and ERC-8004 agentId verification (hard-fail on mismatch).
- cmd/obol/buy.go: `obol buy inference [<name>] --seller --model
  --budget --expected-agent-id --no-verify-identity --auto-refill ...`.

* test(flow-11): validate host buy inference on integration
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 9, 2026

Update: integration branch integration/pr450-pr451-cloudflare-obol now includes the host-buy CLI work from #434 via merge commit f594cef9e5e4a2e2bd47613de5a49a330935f4c7.

Host-buy validation used the exact pre-merge integrated head d0cb941f3941191d53287fd7e5a7757896e482c3 and passed.

Targeted checks

  • bash -n flows/flow-11-dual-stack.sh
  • go test ./cmd/obol ./internal/buy ./internal/agentruntime ./internal/hermes ./internal/x402 ./internal/erc8004 -count=1
  • go vet ./cmd/obol ./internal/buy ./internal/agentruntime ./internal/hermes ./internal/x402 ./internal/erc8004
  • git diff --check

Live flow evidence

Ran flows/flow-11-dual-stack.sh with:

  • OBOL_LLM_ENDPOINT=http://192.168.0.24:18080/v1
  • OBOL_LLM_MODEL=gemma4-fast
  • temporary empty DOCKER_CONFIG

Key assertions that passed:

  • Alice registered ERC-8004 agent 5707
  • Bob discovered Alice via the registry
  • host-side obol buy inference path executed successfully
  • PurchaseRequest became Ready with budget 1000 micro-USDC and 5 auths
  • buyer sidecar showed exactly 5 remaining auths
  • paid inference succeeded via paid/gemma4-fast
  • on-chain settlement succeeded

Exact-head receipts:

  • registration tx: 0x233a1f12d1d9c2bf5eb742ffbd8c81ca7577655f953a35fe88d8b884a24a3464
  • metadata tx: 0x29b48d900829f6b4b650c986d6414ba188a896f1a1ce326175a20ce0892141d2
  • settlement tx: 0x91af8cf4aacb0fdd100e7bd1b125b5e693c93fc326cdf343104a3cd67fba14c6

Flow result:

  • Dual-stack test complete: 52/50 passed
  • paid inference reply: OBOL payment smoke test passed.

Artifacts from that exact-head run:

  • .tmp/flow-11-20260509-204453/receipt-summary.json

* Agent crd

* Next phase

* 1, 2a, 2b, 2c, 4a, 4b, 5, 6, 7, 8, 9

* 2d

* Update with almost all complete, time for testing

* Bug fixing

* chore: remove stray runtime log

* chore(flows): renumber sell-agent smoke flow for integration

* fix(agent): harden CRD update sync semantics

---------

Co-authored-by: bussyjd <bussyjd@users.noreply.github.com>
Co-authored-by: bussyjd <jd@obol.tech>
@bussyjd bussyjd merged commit 8467a8d into main May 9, 2026
6 checks passed
OisinKyne pushed a commit that referenced this pull request May 11, 2026
Both versions were intended to land via the integration branch behind
PR #452 but did not make it through the squash merges. Aligning main
with the latest published tags.

- frontend: v0.1.21-rc1 → v0.1.23 (real release, off the rc)
- hermes-agent: v2026.4.30 → v2026.5.7
- justfile dev-frontend-reset target: v0.1.19 → v0.1.23
OisinKyne pushed a commit that referenced this pull request May 11, 2026
Pulls forward five small correctness fixes that were carried on the
integration branch behind #452 but did not survive the squash merges.

- Re-queue offers when their referenced Agent changes. Without this an
  Agent status edit (e.g. status.pinnedModel after the user edits
  spec.model) never propagates into the offer's status.agentResolution
  because the offer reconciler only runs when the offer itself changes.

- Refuse to Update Namespace and PersistentVolumeClaim during
  applyAgentObject. PVCs reject wholesale Update with
  "spec is immutable after creation", and the controller's RBAC only
  grants `create` on Namespaces. Treat existence as success for these
  kinds and move on; mutable kinds (ConfigMap, Secret, Deployment,
  Service, ServiceAccount) keep going through the normal Update path.

- Fall back to status.agentResolution.Model in the storefront catalog
  when an offer's spec.model is empty (the canonical state for
  type=agent offers, where the model lives on the linked Agent).

- Bump the serviceoffer-controller Deployment memory request from 64Mi
  to 128Mi and the limit from 256Mi to 512Mi. The Agent informer + agent
  reconciler + in-controller keystore generation pushed steady-state
  past 256Mi after #453 and triggered OOMKilled restart loops.

- Set GATEWAY_ALLOW_ALL_USERS=true on CRD-rendered agent pods. CRD
  agents only expose the API (gated by API_SERVER_KEY + ForwardAuth);
  no Telegram/Discord/dashboard platforms are wired. The flag silences
  Hermes' user-gateway startup warning without opening any real
  surface.
OisinKyne pushed a commit that referenced this pull request May 11, 2026
Pulls forward five small correctness fixes that were carried on the
integration branch behind #452 but did not survive the squash merges.

- Re-queue offers when their referenced Agent changes. Without this an
  Agent status edit (e.g. status.pinnedModel after the user edits
  spec.model) never propagates into the offer's status.agentResolution
  because the offer reconciler only runs when the offer itself changes.

- Refuse to Update Namespace and PersistentVolumeClaim during
  applyAgentObject. PVCs reject wholesale Update with
  "spec is immutable after creation", and the controller's RBAC only
  grants `create` on Namespaces. Treat existence as success for these
  kinds and move on; mutable kinds (ConfigMap, Secret, Deployment,
  Service, ServiceAccount) keep going through the normal Update path.

- Fall back to status.agentResolution.Model in the storefront catalog
  when an offer's spec.model is empty (the canonical state for
  type=agent offers, where the model lives on the linked Agent).

- Bump the serviceoffer-controller Deployment memory request from 64Mi
  to 128Mi and the limit from 256Mi to 512Mi. The Agent informer + agent
  reconciler + in-controller keystore generation pushed steady-state
  past 256Mi after #453 and triggered OOMKilled restart loops.

- Set GATEWAY_ALLOW_ALL_USERS=true on CRD-rendered agent pods. CRD
  agents only expose the API (gated by API_SERVER_KEY + ForwardAuth);
  no Telegram/Discord/dashboard platforms are wired. The flag silences
  Hermes' user-gateway startup warning without opening any real
  surface.
OisinKyne pushed a commit that referenced this pull request May 11, 2026
…odel pin

Pulls forward three dev-experience improvements from the integration
branch behind #452 that did not survive the squash merges.

- Selective image rebuild via OBOL_FORCE_REBUILD_LOCAL_DEV_IMAGES.
  The variable now accepts a comma-separated list of image short names
  (e.g. `x402-verifier,serviceoffer-controller`) in addition to the
  existing `true`/`all` and `false`/`0`/unset behaviours. The full
  image set is x402-verifier, serviceoffer-controller, x402-buyer,
  demo-server, and obol-stack-public-storefront (with `public-storefront`
  accepted as an alias). Saves a full ~10-minute rebuild when only one
  image changed.

- Claude Code plugin install tip on stack up. After `obol stack up`,
  if the `claude` CLI is present but the ObolNetwork/skills marketplace
  or its plugin isn't installed, surface a one-line install hint.
  Reads ~/.claude/plugins/{known_marketplaces,installed_plugins}.json
  best-effort; silently no-ops on any error so a malformed Claude
  config can never block stack up.

- Auto-pin a model on the agent-backed demo. `obol sell agent --demo`
  resolves the first non-`paid/*` model from the cluster's LiteLLM
  config (the same source `obol model list` reads) and writes it into
  the rendered Agent's spec.model so the controller doesn't park at
  ModelUnpinned. Returns a clear "configure a model first" error if
  the cluster has nothing usable, and removes a stale "depend on step
  2d" caveat that no longer applies.

Docs updated in CLAUDE.md, .agents/skills/obol-stack-dev/SKILL.md, and
.agents/skills/obol-stack-dev/references/dev-environment.md.
OisinKyne pushed a commit that referenced this pull request May 11, 2026
…odel pin

Pulls forward three dev-experience improvements from the integration
branch behind #452 that did not survive the squash merges.

- Selective image rebuild via OBOL_FORCE_REBUILD_LOCAL_DEV_IMAGES.
  The variable now accepts a comma-separated list of image short names
  (e.g. `x402-verifier,serviceoffer-controller`) in addition to the
  existing `true`/`all` and `false`/`0`/unset behaviours. The full
  image set is x402-verifier, serviceoffer-controller, x402-buyer,
  demo-server, and obol-stack-public-storefront (with `public-storefront`
  accepted as an alias). Saves a full ~10-minute rebuild when only one
  image changed.

- Claude Code plugin install tip on stack up. After `obol stack up`,
  if the `claude` CLI is present but the ObolNetwork/skills marketplace
  or its plugin isn't installed, surface a one-line install hint.
  Reads ~/.claude/plugins/{known_marketplaces,installed_plugins}.json
  best-effort; silently no-ops on any error so a malformed Claude
  config can never block stack up.

- Auto-pin a model on the agent-backed demo. `obol sell agent --demo`
  resolves the first non-`paid/*` model from the cluster's LiteLLM
  config (the same source `obol model list` reads) and writes it into
  the rendered Agent's spec.model so the controller doesn't park at
  ModelUnpinned. Returns a clear "configure a model first" error if
  the cluster has nothing usable, and removes a stale "depend on step
  2d" caveat that no longer applies.

Docs updated in CLAUDE.md, .agents/skills/obol-stack-dev/SKILL.md, and
.agents/skills/obol-stack-dev/references/dev-environment.md.
OisinKyne pushed a commit that referenced this pull request May 11, 2026
…image

Verified locally against ghcr.io/obolnetwork/remote-signer:v0.3.0:

- Main's KEYSTORE_PASSWORD env name is unrecognised; the binary exits
  with Error: NoPassword on startup.
- Main's keystore dir /keystores conflicts with the image's default
  /data/keystores (declared as a volume in the image config).
- Main's /health readiness probe returns HTTP 404; the binary only
  serves /healthz, which returns {"status":"ok"}.

Together these mean any Agent CR with wallet.create=true on main has a
remote-signer that crash-loops or fails liveness, blocking the agent
from ever reaching Ready.

This is what the integration branch behind #452 was carrying. Pulling
it forward:

- Move keystore dir to /data/keystores (the image default), and pin
  the on-disk filename to keystore.json so the Secret volume
  projection no longer needs to thread the V3 UUID through; the V3
  document carries the address internally so the cosmetic filename
  doesn't matter.
- Add ensureCanonicalKeystoreKey migration helper: on reconcile of an
  existing Secret with the wallet annotation, if data is keyed under
  the old UUID-named JSON field, rewrite it as keystore.json
  in-place. Refuses ambiguous Secrets with multiple legacy JSON keys.
- Switch env scheme to upstream's SIGNER__SECTION__KEY hierarchy
  (SIGNER__SERVER__HOST, SIGNER__SERVER__PORT, SIGNER__KEYSTORE__DIR,
  SIGNER__KEYSTORE__PASSWORD, SIGNER__LOGGING__FORMAT/LEVEL). Matches
  the master agent's working config in hermes-obol-agent.
- Switch readiness and liveness probes from /health to /healthz.

Adds 8 unit tests covering fresh keystore creation, reuse, legacy key
migration, ambiguity rejection, malformed data, and the canonical
Secret/Deployment shape (single keystore.json projected, password
read via env, never mounted).
OisinKyne pushed a commit that referenced this pull request May 11, 2026
…image

Verified locally against ghcr.io/obolnetwork/remote-signer:v0.3.0:

- Main's KEYSTORE_PASSWORD env name is unrecognised; the binary exits
  with Error: NoPassword on startup.
- Main's keystore dir /keystores conflicts with the image's default
  /data/keystores (declared as a volume in the image config).
- Main's /health readiness probe returns HTTP 404; the binary only
  serves /healthz, which returns {"status":"ok"}.

Together these mean any Agent CR with wallet.create=true on main has a
remote-signer that crash-loops or fails liveness, blocking the agent
from ever reaching Ready.

This is what the integration branch behind #452 was carrying. Pulling
it forward:

- Move keystore dir to /data/keystores (the image default), and pin
  the on-disk filename to keystore.json so the Secret volume
  projection no longer needs to thread the V3 UUID through; the V3
  document carries the address internally so the cosmetic filename
  doesn't matter.
- Add ensureCanonicalKeystoreKey migration helper: on reconcile of an
  existing Secret with the wallet annotation, if data is keyed under
  the old UUID-named JSON field, rewrite it as keystore.json
  in-place. Refuses ambiguous Secrets with multiple legacy JSON keys.
- Switch env scheme to upstream's SIGNER__SECTION__KEY hierarchy
  (SIGNER__SERVER__HOST, SIGNER__SERVER__PORT, SIGNER__KEYSTORE__DIR,
  SIGNER__KEYSTORE__PASSWORD, SIGNER__LOGGING__FORMAT/LEVEL). Matches
  the master agent's working config in hermes-obol-agent.
- Switch readiness and liveness probes from /health to /healthz.

Adds 8 unit tests covering fresh keystore creation, reuse, legacy key
migration, ambiguity rejection, malformed data, and the canonical
Secret/Deployment shape (single keystore.json projected, password
read via env, never mounted).
bussyjd added a commit that referenced this pull request May 12, 2026
`resolveAssetTermsFor` returned `--token X is not available on chain Y
(supported tokens: OBOL, USDC)` when a token wasn't registered for the
requested chain. The "supported tokens" list came from the global
registry (`SupportedTokens()`), not from the chain, so operators reading
the error saw OBOL listed as supported even though the lookup just
failed on `base-sepolia`/`base`/etc. This was actively misleading.

Surfaced today on spark2 while wiring `obol sell inference … --token
OBOL --chain base-sepolia` — the binary (v0.9.0) rejected OBOL on
base-sepolia (registry entry added in #452 after the release was cut),
but the message claimed OBOL was supported.

Changes:
- Add `TokensOnChain(chain)` and `ChainsForToken(token)` helpers in
  internal/x402/tokens.go so callers can ask the registry chain-scoped
  questions without iterating it themselves.
- Rewrite the error in `resolveAssetTermsFor` to use both:
    `--token OBOL is not available on chain base-sepolia; tokens on
     base-sepolia: OBOL, USDC; OBOL is registered on: base-sepolia,
     ethereum`
  with four branches covering the chain-empty, token-empty, both-empty,
  and normal cases.
- Add table-driven tests covering the helpers (chains/tokens lookups,
  aliases, unknown chain/token, case-insensitive token names).

Co-authored-by: bussyjd <bussyjd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants