feat(x402): chain label, last-settlement gauge, PurchaseAutoRefill sync + verifier PodMonitor + obol-frontend RBAC for purchaserequests#513
Conversation
…oRefill
Unblocks per-chain earnings/spend aggregation in the frontend's My Listings
and My Purchases pages.
- obol_x402_buyer_* metrics now carry a `chain` label sourced from
UpstreamConfig.Network (already in payload, just wasn't on the labels).
- obol_x402_verifier_* metrics now carry a `chain` label sourced from
RouteRule.Network. Existing verifier metric tests updated to assert the
new label (empty string when no Network is set on the rule).
- internal/monetizeapi/types.go PurchaseAutoRefill struct now mirrors the
CRD spec (purchaserequest-crd.yaml lines 93-96) by including MaxTotal +
MaxSpendPerDay. The CRD already accepts these, the Go types just
weren't reading them.
Together this means the frontend can soon switch the EarningsStrip /
WalletStrip from zeroed placeholders to real PromQL aggregates such as:
sum by (chain) (increase(obol_x402_buyer_payment_success_total[7d]))
Closes the data loop for the frontend My Listings EarningsStrip + the
"Last settlement" timestamp the design canvas wants.
- New gauge obol_x402_verifier_last_payment_success_seconds, labeled
by (route, offer_namespace, offer_name, chain). Stamped via
SetToCurrentTime() in both ForwardAuth and proxy-mode paths
whenever a paid request reaches the seller successfully.
- helmfile.yaml grows an x402-verifier PodMonitor (the namespace was
previously scraping only litellm-x402-buyer). Same release: monitoring
label so kube-prometheus-stack picks it up.
The frontend already has matching consumers
(chargedSalesByOfferAndChain, chargedRequests24hByOffer,
lastSettlementByOffer in PrometheusClient) — without this scrape the
metrics never reach the dashboard.
…Refill
Closes the test gap left open by the recent chain-label + last-settlement
gauge work. 14 new subtests across three packages plus four pre-existing
buyer-proxy assertions updated to carry the new chain label.
New tests:
- internal/x402/verifier_test.go
TestVerifier_LastPaymentSuccessGauge (3 subtests):
successful payment stamps gauge within ±5s of time.Now(),
unpaid 402 leaves it untouched, rejected payment leaves it untouched.
findVerifierMetricValue helper for time-window assertions.
- internal/x402/buyer/metrics_test.go
TestPrometheusLabels_ChainPropagation (3 subtests):
base-sepolia / base mainnet / empty chain.
TestMetrics_ChainLabelScrapeRoundtrip (2 subtests):
scrape /metrics through the registry, assert every counter +
gauge series carries the expected chain label.
- internal/monetizeapi/types_test.go
TestPurchaseAutoRefill_JSONRoundTrip (5 subtests):
full population, only new caps, all-zero omitempty, single fields.
TestPurchaseAutoRefill_UnmarshalAcceptsCRDForm:
catches json-tag drift between the Go struct and CRD spec.
Pre-existing fix:
- internal/x402/buyer/proxy_test.go — four TestProxy_* assertions had
label maps without `chain`; tests use Network "base-sepolia" so the
expected chain is now spelled out alongside upstream + remote_model.
RBAC:
- helmfile.yaml: obol-frontend ClusterRole grows read access for
purchaserequests + purchaserequests/status (frontend My Purchases
needs list; agent buy.py + controller remain the only writers).
Live-patched into the running cluster too.
…ording rules
Phase 1 + Phase 2 hardening on top of the chain-label/last-settlement work,
incorporating findings from the 4-agent K8s architecture review. Skips the
auth-on-mutating-endpoints item per operator clarification: the obol-stack
frontend is local-only behind the obol.stack hostname restriction, so it's
not the primary trust boundary.
RBAC trims:
- Drop `secrets get/list` from obol-frontend-openclaw-discovery
ClusterRole; pre-existing dangling grant, no code reads them.
- Drop /status subresource from purchaserequests rule; frontend never
writes status (only the controller does).
Monitoring + RBAC co-location (kills 3 bedag/raw helmfile releases):
- x402-verifier: PodMonitor -> ServiceMonitor in base/templates/x402.yaml.
Verifier has a stable Service on port http:8080; ServiceMonitor scrapes
the endpoint cleanly across replicas.
- litellm-x402-buyer: PodMonitor moved into base/templates/llm.yaml.
Stays a PodMonitor because the sidecar's port 8402 is per-pod, not
fronted by a Service.
- obol-frontend RBAC moved into base/templates/obol-frontend-rbac.yaml
next to the workload it grants.
Label cardinality:
- Drop `route` label from verifier metrics. (offer_namespace, offer_name,
chain) already uniquely scopes a paid route; `route` (= rule.Pattern)
was redundant and unbounded by path fragments.
PrometheusRule (new base/templates/x402-prometheus-rules.yaml):
- Recording: x402:revenue:24h_by_offer_chain,
x402:revenue:7d_by_offer_chain, x402:revenue:lifetime_by_offer,
x402:settlement_rate:1h_by_offer_chain. The frontend's PrometheusClient
reads these so renaming raw metrics no longer breaks the UI, and the
`increase()` 2-sample minimum no longer leaves cold offers at "0" for
the first 30s of traffic.
- Alerting: X402PaymentFailureRateHigh (>10% over 1h),
X402NoSettlementsAfterChallenge (402s issued, no charges).
Deferred (out of scope for this hardening pass):
- Frontend-egress NetworkPolicy: on k3s + Flannel the apiserver Service
endpoints point at the host process, outside the cluster pod/service
CIDRs. A clean allowlist policy can't target the apiserver portably
without an install-specific ipBlock; revisit when obol-stack ships a
non-k3s deployment surface.
- obol-marketplace-api aggregator service: overkill for the local
single-operator context.
- Three-deployment-paths consolidation (helmfile + bedag/raw + Go
`EnsureVerifier`): larger refactor; tracked as separate workstream.
Live validation:
- 2 paid requests against demo-hello survive both the RBAC trims and
the ServiceMonitor swap. `x402:revenue:7d_by_offer_chain` returns
1.0076 for chain=eip155:84532 (matches the underlying
obol_x402_verifier_charged_requests_total counter at value 2 over
2 samples).
- /api/marketplace/purchases still returns 200 after dropping the
/status grant.
- /api/agents/wallets returns the agent wallet via the new batched
listAllWalletMetadata path (1 ConfigMap list vs N+1 per-instance).
|
Phase 1+2 hardening pushed (commit Picked up findings from a 4-agent K8s architecture review (observability, RBAC, helmfile, frontend boundary). What landed in this commit: RBAC trims (defense-in-depth — the obol-stack frontend is local-only behind the
Monitoring + RBAC co-location (kills 3
Label cardinality:
PrometheusRule (new
Live-validated: 2 paid requests against Explicitly deferred (out of scope, documented inline):
Pairs with the frontend hardening commit on https://github.com/ObolNetwork/obol-stack-front-end/pull/330 |
The verifier's per-offer counters and the last_payment_success_seconds
gauge were created on first use and never removed. Deleting an offer
(via `obol sell delete`, ServiceOffer CR deletion, or pricing config
edit) left stale series in the registry forever, which:
* pollutes My Listings / dashboards with rows for offers that no
longer exist,
* lets X402NoSettlementsAfterChallenge keep referencing dead labels,
* silently inflates the "last successful charge" gauge with timestamps
from offers the operator already retired.
Verifier.load() now diffs the incoming route set against the live label
tuples in the registry and calls DeletePartialMatch on each vec for
every (offer_namespace, offer_name, chain) triple that is no longer
served. Both reload paths (file config watcher and the kube
ServiceOffer informer via ConfigAccumulator) funnel through load(), so
one hook covers everything.
Also fixes a guard test from the prior hardening commit that was still
asserting the old "no ServiceMonitor here" invariant after we
intentionally relocated the ServiceMonitor into this manifest. Flipped
to assert presence so a future cleanup can't silently drop it.
Test:
TestVerifier_Reload_PrunesDeletedOfferSeries stamps two offers' worth
of metrics, reloads with one removed, and asserts the removed offer
is gone from all six vecs while the kept offer survives.
|
Follow-up: GC for stale verifier metric series (commit Closes the last deferred item from the hardening pass. Problem: deleting an offer left its per-label series in the verifier registry forever — most notably Fix: Test: Also: fixed a guard test ( |
|
One accounting nit before merge: routes that inherit the global chain still emit In this PR, That means the new per-chain recordings can produce a blank-chain bucket for the normal/default route case instead of Suggested shape: thread the resolved |
Summary
Powers the new My Listings → EarningsStrip and My Purchases → expansion drawer surfaces in
obol-stack-front-end#parity/results-barby closing three data-pipeline gaps:chainlabel is now on everyobol_x402_buyer_*andobol_x402_verifier_*series — required for per-chain aggregation in the EarningsStrip / WalletStrip.obol_x402_verifier_last_payment_success_secondsgauge stamps the most recent settled payment per route — powers the "Last settlement · 2m ago" cell.PurchaseAutoRefillGo-type sync with the CRD spec — addsMaxTotal+MaxSpendPerDayfields that already existed inpurchaserequest-crd.yamlbut were unread by the controller / frontend.Plus the x402-verifier
PodMonitorthe namespace was missing (onlylitellm-x402-buyerwas scraped), andobol-frontendRBAC grows read access topurchaserequestsso the new/api/marketplace/purchasesroute can list them.What changed
internal/x402/buyer/metrics.gochainto all 9 counter/gauge label setsinternal/x402/buyer/proxy.gocfg.NetworkthroughprometheusLabels()at all 3 callsitesinternal/x402/metrics.golastPaymentSuccessGaugeVec + register itinternal/x402/verifier.gochainlabel inprometheusLabels(rule);SetToCurrentTime()on success in both ForwardAuth + proxy-mode pathsinternal/monetizeapi/types.goPurchaseAutoRefill.MaxTotal int+MaxSpendPerDay stringinternal/x402/verifier_test.goTestVerifier_LastPaymentSuccessGauge(3 subtests)internal/x402/buyer/proxy_test.goTestProxy_*label-map assertions for the new chain labelinternal/x402/buyer/metrics_test.go(new)TestPrometheusLabels_ChainPropagation(3) +TestMetrics_ChainLabelScrapeRoundtrip(2)internal/monetizeapi/types_test.go(new)TestPurchaseAutoRefill_JSONRoundTrip(5) +TestPurchaseAutoRefill_UnmarshalAcceptsCRDForminternal/embed/infrastructure/helmfile.yamlx402-verifierPodMonitor (release: monitoring) + grantobol-frontendClusterRoleget/list/watchonpurchaserequestsLive validation (base-sepolia)
Funded the default
obol-agentwallet (0xBb0a70F713334401063c9A8519014F6F026E1c5e) with 0.005 ETH and 3 USDC from the smoke-test seller key, then drove two paid requests throughbuy.py payagainstdemo-hello:The verifier-side new image was rebuilt locally (
localhost:54103/x402-verifier:dev, sha256:af11911f…), imported into the k3d cluster, and thePodMonitorconfirmedup{job="x402/x402-verifier"} = 1for both replicas before traffic flowed.CRD-cleanliness notes
metadata.annotations["obol.org/paused"]mechanism (already honored by the controller atinternal/serviceoffercontroller/controller.go:458–463). The frontend's new PATCH route writes the annotation; the controller does the rest.purchaserequest-crd.yamlalready declaredmaxTotal(integer) andmaxSpendPerDay(string) — the Go types just hadn't been updated. This commit closes the drift, whichTestPurchaseAutoRefill_UnmarshalAcceptsCRDFormnow guards against re-occurring.obol_x402_verifier_last_payment_success_secondsgauge is the cleanest place for the one timestamp the design canvas asks for.Test plan
go test ./internal/x402/... ./internal/monetizeapi/...— green (14 new subtests, 4 pre-existing buyer-proxy assertions updated, no skips)go vet ./internal/x402/... ./internal/monetizeapi/...— cleango build ./...— cleanbuy.py payagainstdemo-helloon base-sepolia — verifier metrics scraped, frontend/api/sell/listreturns non-zerochargedByChain+lastSettlementUnixflow-08should now pass label assertions without manual editsRelated
Pairs with frontend PR https://github.com/ObolNetwork/obol-stack-front-end/pull/new/parity/results-bar which consumes these labels via three new
PrometheusClientmethods (chargedSalesByOfferAndChain,chargedRequests24hByOffer,lastSettlementByOffer).