Skip to content

feat: PurchaseRequest CRD and controller for buy-side x402 (#329)#330

Closed
bussyjd wants to merge 22 commits intomainfrom
feat/purchase-request-crd
Closed

feat: PurchaseRequest CRD and controller for buy-side x402 (#329)#330
bussyjd wants to merge 22 commits intomainfrom
feat/purchase-request-crd

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented Apr 8, 2026

Summary

Introduces PurchaseRequest CRD and extends the serviceoffer-controller to reconcile buy-side purchases. Replaces direct ConfigMap writes from buy.py with a controller-based pattern matching the sell-side.

Addresses #329.

Controller stages

  1. Probed — probe endpoint → 402, validate pricing
  2. AuthsSigned — sign ERC-3009 auths via remote-signer (cluster DNS)
  3. Configured — write buyer ConfigMaps with optimistic concurrency
  4. Ready — verify sidecar loaded via pod /status

Why controller instead of direct RBAC

  • No cross-namespace writes from agent (prompt injection risk)
  • Optimistic concurrency prevents ConfigMap corruption
  • Validation before write (pricing, balance, signatures)
  • Finalizer-based cleanup on delete
  • Status conditions for observability

Files

File Lines Purpose
purchaserequest-crd.yaml 115 CRD definition
purchase.go 270 Reconciler (4 stages)
purchase_helpers.go 230 ConfigMap merge, signer, status check
types.go +60 PurchaseRequest Go types
x402.yaml +10 RBAC for PurchaseRequests + pods

Test plan

  • go test ./... all pass
  • go build ./... compiles
  • Dual-stack integration test with flow-11-dual-stack.sh (next step: update buy.py to create CRs)

bussyjd added 5 commits April 9, 2026 00:48
Two fixes validated with real Base Sepolia x402 payments between
two DGX Spark nodes running Nemotron 120B inference.

1. **CA certificate bundle**: The x402-verifier runs in a distroless
   container with no CA store. TLS verification of the public
   facilitator (facilitator.x402.rs) fails with "x509: certificate
   signed by unknown authority". Fix: `obol sell pricing` now reads
   the host CA bundle and patches it into the `ca-certificates`
   ConfigMap mounted by the verifier.

2. **Missing Description field**: The facilitator rejects verify
   requests that lack a `description` field in PaymentRequirement
   with "invalid_format". Fix: populate Description from the route
   pattern when building the payment requirement.

## Validated testnet flow

### Alice (seller)

```
obolup.sh                    # bootstrap dependencies
obol stack init && obol stack up
obol model setup custom --name nemotron-120b \
  --endpoint http://host.k3d.internal:8000/v1 \
  --model "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4"
obol sell pricing --wallet 0xC0De...97E --chain base-sepolia
obol sell http nemotron \
  --wallet 0xC0De...97E --chain base-sepolia \
  --per-request 0.001 --namespace llm \
  --upstream litellm --port 4000 \
  --health-path /health/readiness \
  --register --register-name "Nemotron 120B on DGX Spark"
obol tunnel restart
```

### Bob (buyer)

```
# 1. Discover
curl $TUNNEL/.well-known/agent-registration.json
# → name: "Nemotron 120B on DGX Spark", x402Support: true

# 2. Probe
curl -X POST $TUNNEL/services/nemotron/v1/chat/completions
# → 402: payTo=0xC0De...97E, amount=1000, network=base-sepolia

# 3. Sign EIP-712 TransferWithAuthorization + pay
python3 bob_buy.py
# → 200: "The meaning of life is to discover and pursue purpose"
```

### On-chain receipts (Base Sepolia)

| Tx | Description |
|----|-------------|
| 0xd769953b...c231ec0 | x402 settlement: Bob→Alice 0.001 USDC via ERC-3009 |

Balance change: Alice +0.001 USDC, Bob -0.001 USDC.
Facilitator: https://facilitator.x402.rs (real public settlement).
Replace the third-party facilitator.x402.rs with the Obol-operated
facilitator at x402.gcp.obol.tech. This gives us control over
uptime, chain support, and monitoring (Grafana dashboards already
deployed in obol-infrastructure).

Introduces DefaultFacilitatorURL constant in internal/x402 and
updates all references: CLI flag default, config loader, standalone
inference gateway, and deployment store.

Companion PR in obol-infrastructure adds Base Sepolia (84532) to
the facilitator's chain config alongside Base Mainnet (8453).
Address #321 — LiteLLM reliability improvements:

1. Hot-add models via /model/new API instead of restarting the
   deployment. ConfigMap still patched for persistence. Restart
   only triggered when API keys change (Secret mount requires it).

2. Scale to 2 replicas with RollingUpdate (maxUnavailable: 0,
   maxSurge: 1) so a new pod is ready before any old pod terminates.

3. PodDisruptionBudget (minAvailable: 1) prevents both replicas
   from being down simultaneously during voluntary disruptions.

4. preStop hook (sleep 10) gives EndpointSlice time to deregister
   the terminating pod before SIGTERM — prevents in-flight request
   drops during rolling updates.

5. Reloader annotation on litellm-secrets — Stakater Reloader
   triggers rolling restart on API key rotation, no manual restart.

6. terminationGracePeriodSeconds: 60 — long inference requests
   (e.g. Nemotron 120B at 30s+) have time to complete.
…issing

The prerequisite check blocked installation entirely when Node.js
was not available, even though Docker could extract the openclaw
binary from the published image. This prevented bootstrap on
minimal servers (e.g. DGX Spark nodes with only Docker + Python).

Changes:
- Prerequisites: only fail if BOTH npm AND docker are missing
- install_openclaw(): try npm first, fall back to Docker image
  extraction (docker create + docker cp) when npm unavailable
Introduces PurchaseRequest CRD and extends the serviceoffer-controller
to reconcile buy-side purchases. This replaces direct ConfigMap writes
from buy.py with a controller-based pattern matching the sell-side.

## New resources

- **PurchaseRequest CRD** (`obol.org/v1alpha1`): declarative intent to
  buy inference from a remote x402-gated endpoint. Lives in the agent's
  namespace.

## Controller reconciliation (4 stages)

1. **Probed** — probe endpoint → 402, validate pricing matches spec
2. **AuthsSigned** — call remote-signer via cluster DNS to sign
   ERC-3009 TransferWithAuthorization vouchers
3. **Configured** — write buyer ConfigMaps in llm namespace with
   optimistic concurrency, restart LiteLLM
4. **Ready** — verify sidecar loaded auths via pod /status endpoint

## Security

- Agent only creates PurchaseRequest CRs (own namespace, no cross-NS)
- Controller has elevated RBAC for ConfigMaps in llm, pods/list
- Remote-signer accessed via cluster DNS (no port-forward)
- Finalizer handles cleanup on delete (remove upstream from config)

## RBAC

- Added PurchaseRequest read/write to serviceoffer-controller ClusterRole
- Added pods/get/list for sidecar status checks

Addresses #329. Companion to the dual-stack integration test.
// POST /model/new via kubectl exec on a running litellm pod.
curlCmd := fmt.Sprintf(
`wget -qO- --post-data='%s' --header='Content-Type: application/json' --header='Authorization: Bearer %s' http://localhost:4000/model/new`,
string(bodyJSON), masterKey)
bussyjd added 13 commits April 9, 2026 01:12
…rites

Modifies buy.py cmd_buy to create a PurchaseRequest CR in the agent's
own namespace instead of writing ConfigMaps cross-namespace. The
serviceoffer-controller (PR #330) reconciles the CR: probes the
endpoint, signs auths via remote-signer, writes buyer ConfigMaps in
llm namespace, and verifies sidecar readiness.

Changes:
- buy.py: replace steps 5-6 (sign + write ConfigMaps) with
  _create_purchase_request() + _wait_for_purchase_ready()
- Agent RBAC: add PurchaseRequest CRUD to openclaw-monetize-write
  ClusterRole (agent's own namespace only, no cross-NS access)
- Keep steps 1-4 (probe, wallet, balance, count) for user feedback

The agent SA can now create PurchaseRequests but never writes to
ConfigMaps in the llm namespace. All ConfigMap operations are
serialized through the controller with optimistic concurrency.
Three fixes discovered during dual-stack testnet validation:

1. **eRPC URL**: `obol sell register` used `http://localhost/rpc` which
   gets 404 from Traefik (wrong Host header). Changed to
   `http://obol.stack/rpc` which matches the HTTPRoute hostname.

2. **--private-key-file ignored**: When OpenClaw agent is deployed, sell
   register always preferred the remote-signer path and silently ignored
   --private-key-file. Now honours user intent: explicit key file flag
   takes priority over remote-signer auto-detection.

3. **Flow script**: add --allow-writes for Base Sepolia eRPC (needed for
   on-chain tx submission), restart eRPC after config change.

Validated: `obol sell register --chain base-sepolia --private-key-file`
mints ERC-8004 NFT (Agent ID 3826) on Base Sepolia via eRPC.
Update dual-stack test to verify PurchaseRequest CR exists after
the agent runs buy.py. The agent prompt stays the same — buy.py's
interface is unchanged, only the backend (CR instead of ConfigMap).
- Fix getSignerAddress to handle string array format from remote-signer
- Fix flow-11: polling for pod readiness, LISTEN port check, anchored
  sed patterns, auto-fund remote-signer wallet
- Auto-fund Bob's remote-signer with USDC from .env key (shortcut for #331)
- resourceVersion handling for PurchaseRequest 409 Conflict

Known issue: controller's signAuths sends typed-data in a format the
remote-signer doesn't accept (empty signature). Needs investigation
of the remote-signer's /api/v1/sign/<addr>/typed-data API format.
Workaround: buy.py signs locally, controller only needs to copy
auths to buyer ConfigMaps (architectural simplification planned).
…rets)

Architectural simplification: instead of the controller reading a Secret
cross-namespace (security risk), buy.py embeds the pre-signed auths
directly in the PurchaseRequest spec.preSignedAuths field.

Flow:
1. buy.py signs auths locally (remote-signer in same namespace)
2. buy.py creates PurchaseRequest CR with auths in spec
3. Controller reads auths from CR spec (same PurchaseRequest RBAC)
4. Controller writes to buyer ConfigMaps in llm namespace

No cross-namespace Secret read. No general secrets RBAC.
Controller only needs PurchaseRequest read + ConfigMap write in llm.

Validated: test PurchaseRequest with embedded auth →
  Probed=True, AuthsSigned=True (loaded from spec),
  Configured=True (wrote to buyer ConfigMaps).
  Ready pending sidecar reload (ConfigMap propagation delay).
path = f"/api/v1/namespaces/{ns}/secrets"
try:
_kube_json("POST", path, token, ssl_ctx, secret)
print(f" Stored {len(auths)} auths in Secret {ns}/{secret_name}")
existing = _kube_json("GET", f"{path}/{secret_name}", token, ssl_ctx)
secret["metadata"]["resourceVersion"] = existing["metadata"]["resourceVersion"]
_kube_json("PUT", f"{path}/{secret_name}", token, ssl_ctx, secret)
print(f" Updated Secret {ns}/{secret_name} with {len(auths)} auths")
bussyjd added 4 commits April 9, 2026 02:20
The macOS CA bundle (~290KB) exceeds the 262KB annotation limit
that kubectl apply requires. The previous implementation used
kubectl patch --type=merge which hits the same limit.

Switch to "kubectl create --dry-run=client -o yaml | kubectl replace"
which bypasses the annotation entirely. Add PipeCommands helper to
the kubectl package for this pattern.

Tested: obol sell pricing now populates the ca-certificates ConfigMap
automatically on both macOS (290KB /etc/ssl/cert.pem) and Linux
(220KB /etc/ssl/certs/ca-certificates.crt).
The CA ConfigMap is mounted as a volume. Kubernetes may take 60-120s
to propagate changes to running pods. The verifier needs TLS to work
immediately for the facilitator connection, so trigger a rollout
restart right after populating the CA bundle.

Validated: fresh stack → obol sell pricing → CA auto-populated
(339KB on macOS) → verifier restarted → zero TLS errors.
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented Apr 9, 2026

Superseded by the validated integration branch \ and the \ prerelease cut from it. The release-candidate branch now carries the tested sell → discover → buy → settle path, updated docs/skills, and the final x402/buy-side fixes.

@bussyjd bussyjd closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants