Skip to content

fix(serviceoffer-controller): align remote-signer config with v0.3.0 image#465

Merged
OisinKyne merged 1 commit into
mainfrom
fix/serviceoffer-controller-remote-signer-config
May 11, 2026
Merged

fix(serviceoffer-controller): align remote-signer config with v0.3.0 image#465
OisinKyne merged 1 commit into
mainfrom
fix/serviceoffer-controller-remote-signer-config

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 11, 2026

Summary

The serviceoffer-controller on main currently speaks the wrong contract to the deployed ghcr.io/obolnetwork/remote-signer:v0.3.0 image. Any Agent CR with wallet.create=true has a signer pod that fails to start or fails its probes, blocking the agent from reaching Ready.

Local verification against the published image

I pulled ghcr.io/obolnetwork/remote-signer:v0.3.0 and ran it with both env schemes plus probed both health paths. Results:

Test Result
Image default volume (from docker inspect) /data/keystoresnot /keystores
Run with main's env (KEYSTORE_PASSWORD, KEYSTORE_PATH) Exits immediately: Error: NoPassword
Run with branch's env (SIGNER__KEYSTORE__PASSWORD, SIGNER__KEYSTORE__DIR) Runs healthy for 5+ minutes
GET /health on a running signer HTTP 404
GET /healthz on a running signer HTTP 200 {"status":"ok"}

So every one of main's three signer-related constants is wrong against the deployed image.

Fix

Pulls forward the integration branch's S6 work (originally on integration/pr450-pr451-cloudflare-obol, dropped by the squash merges):

  • Keystore dir moved to /data/keystores (image default).
  • Keystore filename pinned to keystore.json so the Secret's items projection no longer needs to thread the V3 UUID through. The V3 document carries the address internally, so the on-disk name is cosmetic.
  • Migration helper ensureCanonicalKeystoreKey: on reconcile of an existing Secret with the wallet annotation, if data is keyed under the old UUID-named JSON field, rewrite it as keystore.json in place. Refuses ambiguous Secrets that already have multiple legacy JSON keys (operator must intervene).
  • Env scheme switched to upstream's SIGNER__SECTION__KEY hierarchy: SIGNER__SERVER__HOST, SIGNER__SERVER__PORT, SIGNER__KEYSTORE__DIR, SIGNER__KEYSTORE__PASSWORD, SIGNER__LOGGING__FORMAT, SIGNER__LOGGING__LEVEL. Matches the master agent's known-good config in hermes-obol-agent.
  • Probe paths switched from /health to /healthz.

Tests

Adds 8 unit tests in agent_wallet_test.go covering:

  • Fresh keystore creation populates the Secret with annotation + canonical key
  • Reuse of an existing keystore Secret with the canonical key (no-op)
  • Migration from legacy UUID-named key → keystore.json
  • Rejection of ambiguous Secrets with multiple legacy JSON keys
  • Rejection of secrets missing the address annotation or JSON
  • Rejection of malformed Secret data
  • The wallet-disabled no-op path
  • The rendered Deployment shape (single keystore.json projected, password read via env, never mounted into the keystore dir)

Test plan

  • go test ./internal/serviceoffercontroller/... — all green
  • docker run ghcr.io/obolnetwork/remote-signer:v0.3.0 with the new env scheme — runs healthy, /healthz returns 200
  • On a fresh cluster: kubectl apply an Agent CR with wallet.create: true and confirm the remote-signer pod reaches Ready 1/1 without restarts
  • On a cluster that previously had a working old-format Secret (UUID-named key): confirm the migration helper rewrites it to keystore.json on next reconcile without rotating key material

…image

Verified locally against ghcr.io/obolnetwork/remote-signer:v0.3.0:

- Main's KEYSTORE_PASSWORD env name is unrecognised; the binary exits
  with Error: NoPassword on startup.
- Main's keystore dir /keystores conflicts with the image's default
  /data/keystores (declared as a volume in the image config).
- Main's /health readiness probe returns HTTP 404; the binary only
  serves /healthz, which returns {"status":"ok"}.

Together these mean any Agent CR with wallet.create=true on main has a
remote-signer that crash-loops or fails liveness, blocking the agent
from ever reaching Ready.

This is what the integration branch behind #452 was carrying. Pulling
it forward:

- Move keystore dir to /data/keystores (the image default), and pin
  the on-disk filename to keystore.json so the Secret volume
  projection no longer needs to thread the V3 UUID through; the V3
  document carries the address internally so the cosmetic filename
  doesn't matter.
- Add ensureCanonicalKeystoreKey migration helper: on reconcile of an
  existing Secret with the wallet annotation, if data is keyed under
  the old UUID-named JSON field, rewrite it as keystore.json
  in-place. Refuses ambiguous Secrets with multiple legacy JSON keys.
- Switch env scheme to upstream's SIGNER__SECTION__KEY hierarchy
  (SIGNER__SERVER__HOST, SIGNER__SERVER__PORT, SIGNER__KEYSTORE__DIR,
  SIGNER__KEYSTORE__PASSWORD, SIGNER__LOGGING__FORMAT/LEVEL). Matches
  the master agent's working config in hermes-obol-agent.
- Switch readiness and liveness probes from /health to /healthz.

Adds 8 unit tests covering fresh keystore creation, reuse, legacy key
migration, ambiguity rejection, malformed data, and the canonical
Secret/Deployment shape (single keystore.json projected, password
read via env, never mounted).
@OisinKyne OisinKyne force-pushed the fix/serviceoffer-controller-remote-signer-config branch from fce4ca0 to 4862fd7 Compare May 11, 2026 12:22
@OisinKyne OisinKyne enabled auto-merge (rebase) May 11, 2026 12:23
@OisinKyne OisinKyne merged commit a28ed8d into main May 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants