You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Every agent's signer keystore is a Secret named identically — remote-signer-keystore — one per agent namespace, and the keypair is minted inside the serviceoffer-controller process. Because the controller's RBAC is a ClusterRole and resourceNames on a ClusterRole matches a name in any namespace, a single compromised controller can read (and delete) every agent's keystore + decryption password, and therefore derive every agent's spendable signer key.
This is defense-in-depth / blast-radius reduction, not a remotely-exploitable hole — the controller is trusted infra. It matters because the protected asset is spendable, multi-tenant signer keys sitting behind one shared identity.
This issue is the design follow-up that #570 explicitly defers. #570 is the immediate, minimal RBAC hardening (scope the verbs, drop dead update/patch, add the missing delete). It does not — and on its own cannot — isolate agent-A's keystore from agent-B's, because they share a name.
Supersedes draft PR #571 (which carried this as plans/per-agent-keystore-isolation.md). The design now lives here as a tracking issue so it can be discussed and scheduled independently of any code branch.
Current architecture — why the reach exists
flowchart TB
subgraph x402ns["namespace: x402"]
SOC["serviceoffer-controller<br/>ServiceAccount"]
GEN["GenerateKeystoreInMemory()<br/>mints privkey + password<br/><b>inside the controller process</b>"]
SOC --- GEN
end
subgraph cr["ClusterRole: serviceoffer-controller (after #570)"]
RULE["resourceNames: remote-signer-keystore<br/>verbs: get, delete"]
end
subgraph agentA["namespace: agent-alice"]
SA["Secret/remote-signer-keystore<br/>keystore.json + password"]
RSA["remote-signer pod"]
SA --> RSA
end
subgraph agentB["namespace: agent-bob"]
SB["Secret/remote-signer-keystore<br/>keystore.json + password"]
RSB["remote-signer pod"]
SB --> RSB
end
subgraph hermesns["namespace: hermes-obol-agent"]
SH["Secret/remote-signer-keystore<br/>keystore.json + password"]
RSH["remote-signer pod"]
SH --> RSH
end
GEN -->|create| SA
GEN -->|create| SB
GEN -->|create| SH
RULE -. "name matches in ANY ns" .-> SA
RULE -. "name matches in ANY ns" .-> SB
RULE -. "name matches in ANY ns" .-> SH
classDef danger fill:#ffe0e0,stroke:#d73a4a,color:#000;
class SA,SB,SH,GEN danger;
Loading
The dotted edges are the problem: one resourceNames: ["remote-signer-keystore"] rule on a ClusterRole fans out to every namespace that happens to use that name.
Two distinct sub-risks (both verified in code)
Standing cross-agent read. A GET on a Secret returns all keys. The controller SA can GET remote-signer-keystore in every agent namespace → keystore JSON + password → derive each agent's private key. Source: internal/serviceoffercontroller/agent_wallet.go::buildSignerKeystoreSecret writes both keystore.json and password into the one Secret.
In-process custody at mint.openclaw.GenerateKeystoreInMemory() runs inside the controller process (internal/openclaw/wallet.go). The controller generates and holds the private key + password at provisioning time — regardless of RBAC. RBAC scoping alone never removes this.
Note: in steady state the reuse path reads only the address annotation (obol.org/wallet-address) — never the key data after mint. But the capability to read key material is standing, which is exactly the blast radius an attacker uses.
Blast radius (attack path)
sequenceDiagram
autonumber
participant ATK as Attacker
participant SOC as Compromised controller pod
participant API as kube-apiserver
participant CH as Base / chain
ATK->>SOC: RCE in a reconcile path OR malicious controller image
Note over SOC: holds the controller ServiceAccount token
loop for every agent namespace
SOC->>API: GET secret/remote-signer-keystore (ns = agent-N)
API-->>SOC: keystore.json + password
Note over SOC: decrypt V3 keystore -> private key
end
SOC->>CH: sign + broadcast transfers
CH-->>ATK: every agent wallet drained
Loading
Threat model
In scope
A compromised/abused serviceoffer-controller — supply-chain on its image, RCE in a reconcile path, or a malicious ClusterRole edit — reading or deleting other agents' signer keys.
Out of scope
An attacker who already controls a specific agent's own pod/namespace. They already hold that agent's key by design.
The controller is trusted infra, so this is blast-radius reduction, not a remotely-exploitable bug. The asset (spendable signer keys for N tenants) is what makes it worth real isolation rather than "accept and document".
Options considered
flowchart LR
Q{"Isolate per-agent<br/>signer keys?"}
Q -->|"do nothing"| O0["Option 0<br/>Accept + document<br/>controller stays in keystore TCB"]
Q -->|"per-ns Role for<br/>controller SA"| OA["Option A<br/><b>REJECTED</b><br/>isolation theater:<br/>controller binds itself<br/>in every agent ns"]
Q -->|"unique names only"| OC["Option C<br/>insufficient alone<br/>collapses into A"]
Q -->|"agent self-mints"| OB["Option B<br/><b>RECOMMENDED</b><br/>controller leaves<br/>the custody path"]
classDef rec fill:#e0ffe0,stroke:#2da44e,color:#000;
classDef rej fill:#ffe0e0,stroke:#d73a4a,color:#000;
class OB rec;
class OA rej;
Loading
Option
Idea
Verdict
0 — Accept + document
Keep #570 as final; document the controller as part of the keystore TCB.
Honest fallback, but not the end state — the asset warrants real isolation.
A — Per-namespace Role for the controller SA
Controller mints a Role+RoleBinding for itself in each agent ns; drop keystore verbs from the ClusterRole.
Rejected — isolation theater. The controller manages all agents, so it binds itself in every namespace → same reach, more RBAC surface, plus a chicken-and-egg create bootstrap.
C — Unique per-agent keystore names
Name it <agent>-remote-signer-keystore.
Doesn't help a ClusterRole alone (resourceNames has no wildcards). Only useful as hygiene layered onto B.
B — Agent self-mints
Keypair generated inside the agent's own namespace/pod; controller never gains get/create/delete on the keystore.
Recommended. Removes both sub-risks: no shared-name reach, no in-process custody.
In-pod keystore generation — either (a) the remote-signer image self-generates a keystore on first boot when none is mounted, or (b) a tiny init container mints it (reuse openclaw.GenerateKeystoreInMemory logic, shipped as a minimal binary). ← open question Bring obolup code to this repo #1.
Namespaced write RBAC for the agent SA — controller creates, once per agent namespace, a Role granting the agent's SA create/get on remote-signer-keystore in its own namespace + a RoleBinding. The agent SA can never reach another namespace → true isolation.
Address via a non-secret channel — the agent publishes its address (e.g. patches Agent.status.walletAddress, SA scoped to agents/status in its ns) so the controller learns it without a keystore GET.
Controller RBAC shrinks to: litellm-secrets get (fixed ns), hermes-api-server get/create/delete, and zeroremote-signer-keystore access.
If Phase 1 shows B is disproportionately expensive for the current milestone, fall back to Option 0 and revisit — but do not ship Option A as a substitute.
Open questions (resolve before Phase 1 code)
Does ghcr.io/obolnetwork/remote-signer:v0.3.0 generate a keystore on first boot when the keystore dir is empty? (Check the ObolNetwork remote-signer repo / chart.) If yes → no init container needed.
Is Agent.status.walletAddress the right address channel, and can the agent SA be granted patch on agents/status scoped to its own namespace?
Does anything besides the remote-signer pod consume the keystore Secret directly? (Grep across runtimes before removing controller access.)
Acceptance criteria
Controller ClusterRole has no verbs on remote-signer-keystore (guarded by extending TestServiceOfferControllerSecretRBAC_Scoped).
The agent SA's keystore write access is a namespaced Role, never a ClusterRole.
obol agent init still populates Agent.status.walletAddress; teardown still cleans up.
release-smoke sell → buy → teardown stays green.
Pre-production: greenfield, no keystore migration needed.
Summary
Every agent's signer keystore is a Secret named identically —
remote-signer-keystore— one per agent namespace, and the keypair is minted inside theserviceoffer-controllerprocess. Because the controller's RBAC is a ClusterRole andresourceNameson a ClusterRole matches a name in any namespace, a single compromised controller can read (and delete) every agent's keystore + decryption password, and therefore derive every agent's spendable signer key.This is defense-in-depth / blast-radius reduction, not a remotely-exploitable hole — the controller is trusted infra. It matters because the protected asset is spendable, multi-tenant signer keys sitting behind one shared identity.
This issue is the design follow-up that #570 explicitly defers. #570 is the immediate, minimal RBAC hardening (scope the verbs, drop dead
update/patch, add the missingdelete). It does not — and on its own cannot — isolate agent-A's keystore from agent-B's, because they share a name.Current architecture — why the reach exists
flowchart TB subgraph x402ns["namespace: x402"] SOC["serviceoffer-controller<br/>ServiceAccount"] GEN["GenerateKeystoreInMemory()<br/>mints privkey + password<br/><b>inside the controller process</b>"] SOC --- GEN end subgraph cr["ClusterRole: serviceoffer-controller (after #570)"] RULE["resourceNames: remote-signer-keystore<br/>verbs: get, delete"] end subgraph agentA["namespace: agent-alice"] SA["Secret/remote-signer-keystore<br/>keystore.json + password"] RSA["remote-signer pod"] SA --> RSA end subgraph agentB["namespace: agent-bob"] SB["Secret/remote-signer-keystore<br/>keystore.json + password"] RSB["remote-signer pod"] SB --> RSB end subgraph hermesns["namespace: hermes-obol-agent"] SH["Secret/remote-signer-keystore<br/>keystore.json + password"] RSH["remote-signer pod"] SH --> RSH end GEN -->|create| SA GEN -->|create| SB GEN -->|create| SH RULE -. "name matches in ANY ns" .-> SA RULE -. "name matches in ANY ns" .-> SB RULE -. "name matches in ANY ns" .-> SH classDef danger fill:#ffe0e0,stroke:#d73a4a,color:#000; class SA,SB,SH,GEN danger;The dotted edges are the problem: one
resourceNames: ["remote-signer-keystore"]rule on a ClusterRole fans out to every namespace that happens to use that name.Two distinct sub-risks (both verified in code)
GETon a Secret returns all keys. The controller SA canGET remote-signer-keystorein every agent namespace → keystore JSON + password → derive each agent's private key. Source:internal/serviceoffercontroller/agent_wallet.go::buildSignerKeystoreSecretwrites bothkeystore.jsonandpasswordinto the one Secret.openclaw.GenerateKeystoreInMemory()runs inside the controller process (internal/openclaw/wallet.go). The controller generates and holds the private key + password at provisioning time — regardless of RBAC. RBAC scoping alone never removes this.Blast radius (attack path)
sequenceDiagram autonumber participant ATK as Attacker participant SOC as Compromised controller pod participant API as kube-apiserver participant CH as Base / chain ATK->>SOC: RCE in a reconcile path OR malicious controller image Note over SOC: holds the controller ServiceAccount token loop for every agent namespace SOC->>API: GET secret/remote-signer-keystore (ns = agent-N) API-->>SOC: keystore.json + password Note over SOC: decrypt V3 keystore -> private key end SOC->>CH: sign + broadcast transfers CH-->>ATK: every agent wallet drainedThreat model
serviceoffer-controller— supply-chain on its image, RCE in a reconcile path, or a malicious ClusterRole edit — reading or deleting other agents' signer keys.The controller is trusted infra, so this is blast-radius reduction, not a remotely-exploitable bug. The asset (spendable signer keys for N tenants) is what makes it worth real isolation rather than "accept and document".
Options considered
flowchart LR Q{"Isolate per-agent<br/>signer keys?"} Q -->|"do nothing"| O0["Option 0<br/>Accept + document<br/>controller stays in keystore TCB"] Q -->|"per-ns Role for<br/>controller SA"| OA["Option A<br/><b>REJECTED</b><br/>isolation theater:<br/>controller binds itself<br/>in every agent ns"] Q -->|"unique names only"| OC["Option C<br/>insufficient alone<br/>collapses into A"] Q -->|"agent self-mints"| OB["Option B<br/><b>RECOMMENDED</b><br/>controller leaves<br/>the custody path"] classDef rec fill:#e0ffe0,stroke:#2da44e,color:#000; classDef rej fill:#ffe0e0,stroke:#d73a4a,color:#000; class OB rec; class OA rej;createbootstrap.<agent>-remote-signer-keystore.resourceNameshas no wildcards). Only useful as hygiene layered onto B.Recommended target — Option B
flowchart TB subgraph x402ns["namespace: x402"] SOC["serviceoffer-controller<br/><b>no keystore verbs</b>"] end subgraph agentA["namespace: agent-alice"] RoleA["Role + RoleBinding<br/>agent SA: create/get<br/>remote-signer-keystore<br/><b>this ns only</b>"] INITA["init / first-boot mint<br/>in the agent pod"] SAk["Secret/remote-signer-keystore"] STA["Agent.status.walletAddress<br/>(non-secret)"] INITA -->|create| SAk INITA -->|publish addr| STA RoleA -. scopes .-> SAk end SOC -->|"mint Role/RoleBinding once"| RoleA SOC -->|"read address (non-secret)"| STA SOC -. "cannot read keystore" .-x SAk classDef safe fill:#e0ffe0,stroke:#2da44e,color:#000; class SAk,STA,INITA safe;Moving parts
remote-signerimage self-generates a keystore on first boot when none is mounted, or (b) a tiny init container mints it (reuseopenclaw.GenerateKeystoreInMemorylogic, shipped as a minimal binary). ← open question Bring obolup code to this repo #1.create/getonremote-signer-keystorein its own namespace + a RoleBinding. The agent SA can never reach another namespace → true isolation.Agent.status.walletAddress, SA scoped toagents/statusin its ns) so the controller learns it without a keystoreGET.litellm-secretsget (fixed ns),hermes-api-serverget/create/delete, and zeroremote-signer-keystoreaccess.Phased rollout
flowchart TB P1["<b>Phase 1 — decision gate</b><br/>Confirm in-pod mint mechanism<br/>remote-signer self-mint? else init-container tool"] P2["<b>Phase 2 — wiring</b><br/>controller mints namespaced Role/RoleBinding<br/>+ address-reporting channel<br/>ensureAgentWallet waits for agent-reported addr"] P3["<b>Phase 3 — drop access</b><br/>remove remote-signer-keystore from ClusterRole<br/>guard test: no get/create/delete"] P1 -->|"self-mint supported"| P2 P1 -->|"not supported, build init tool"| P2 P2 --> P3 P3 --> DONE["Controller holds <b>no</b><br/>agent signer material"] classDef done fill:#e0ffe0,stroke:#2da44e,color:#000; class DONE done;If Phase 1 shows B is disproportionately expensive for the current milestone, fall back to Option 0 and revisit — but do not ship Option A as a substitute.
Open questions (resolve before Phase 1 code)
ghcr.io/obolnetwork/remote-signer:v0.3.0generate a keystore on first boot when the keystore dir is empty? (Check theObolNetworkremote-signer repo / chart.) If yes → no init container needed.Agent.status.walletAddressthe right address channel, and can the agent SA be grantedpatchonagents/statusscoped to its own namespace?Acceptance criteria
remote-signer-keystore(guarded by extendingTestServiceOfferControllerSecretRBAC_Scoped).obol agent initstill populatesAgent.status.walletAddress; teardown still cleans up.sell → buy → teardownstays green.References
security(controller): scope serviceoffer-controller Secret RBAC to named secrets(the immediate hardening this defers from).internal/serviceoffercontroller/agent_wallet.go—buildSignerKeystoreSecret,ensureSignerKeystore(mint + custody).internal/openclaw/wallet.go—GenerateKeystoreInMemory(in-process keypair).internal/embed/infrastructure/base/templates/x402.yaml—serviceoffer-controllerClusterRole.internal/embed/embed_crd_test.go—TestServiceOfferControllerSecretRBAC_Scoped(the guard to extend in Phase 3).