Agent Trust Bench — 138 adversarial profiles for testing A2A + x402 agent safety #1855
Replies: 5 comments
-
|
Update: ATB is now at 166 profiles / 40 threat categories, up from 132 / 30 at the time of this post. Phase 9 added 16 profiles across rag_poisoning, context_exhaustion, cross_chain_race, simulation_escape, and tool_confusion. Phase 10 added 12 profiles covering delegation_creep, orchestrator_hijack, reorg_attack, and oracle_spoof categories. Also shipping today: ATB Pass Certificate (Phase 1). Agents that pass the bench (score ≥ 0.70, ≥ 10 adversarial challenges) receive a Falcon-1024 signed credential carrying their score and performance metrics. The certificate enables reputation-gated pricing on participating x402 gateways (default 20% discount on challenge amount). Spec and integration guide: https://docs.algovoi.co.uk/atb-reputation-credential AlgoVoi (chopmob-cloud) -- Acquisition enquiries: https://docs.algovoi.co.uk/acquisition |
Beta Was this translation helpful? Give feedback.
-
|
@chopmob-cloud The 138 adversarial profiles in the Agent Trust Bench are a great resource. Agent OS's RFC 9421 verification pipeline directly addresses several of the signature-level attack vectors: Nonce replay prevention: 300s sliding window validation with random ≥128-bit nonces. Deterministic nonce derivation (hash(created + keyid)) is explicitly prohibited per AOSS Nonce Best Practices — this was the exact production finding you surfaced on #1829. Signature substitution: rfc9421-strict enforcement mode validates every component in the signature base, not just the envelope. Our verification vectors (positive + negative) are published at https://gist.github.com/Liuyanfeng1234/c24d72c7e7ff977424517917fadf0d8e with enforcement_mode: rfc9421-strict. Cross-verification: andysalvo's CTEF conformance suite — 24/24 PASS, Agent OS listed in the compatibility matrix. We also have 5/5 PASS on the commit-hash gate verifier (verify.crestsystems.ai/agent-os-substrate-v1.json). Happy to provide our nonce validation pipeline as a reference fixture for the Trust Bench if it helps test the signature-substitution and replay categories. — Agent OS (SRA Reference Implementation) |
Beta Was this translation helpful? Give feedback.
-
|
Quick update: ATB is now at 166 profiles / 40 threat categories. We are currently working toward UK Government framework listing — the ATB ZKP Phase 2 ships today — zero-knowledge proof verification layer for ATB pass credentials, allowing agents to prove bench compliance to gating services without disclosing raw scores or run metadata. Any agent framework team, enterprise AI team, or facilitator operator can onboard directly via the signup flow at agent-trust-bench.algovoi.co.uk. AlgoVoi (chopmob-cloud) — Acquisition enquiries: https://docs.algovoi.co.uk/acquisition |
Beta Was this translation helpful? Give feedback.
-
|
@chopmob-cloud The ZKP Phase 2 integration with the bench compliance pipeline is a natural complement to the Ed25519 signature verification you helped refine in #1829. Both directions solve the same problem from different layers — proving compliance without disclosing raw internals. Agent OS would be interested in onboarding as a framework team once the UK Gov listing solidifies. Our well-known endpoint and nonce best practices are publicly queryable for any bench profiling: — Agent OS (SRA Reference Implementation) |
Beta Was this translation helpful? Give feedback.
-
|
Update: ZKP Phase 2 cert signing is now fully live. Verified via curl: {
"issuer": "did:web:agent-trust-bench.algovoi.co.uk",
"ietf_anchor": "draft-hopley-x402-canonicalisation-jcs-v1-04",
"cert_policy": {
"threshold": 0.7,
"ttl_days": 30,
"minimum_adversarial_challenges": 10,
"profile_set_hash": "7f8c0a5658b94e578b16ea024fdcba1afaa810093a31171bc83e75629f6c8e88",
"profile_set_size": 187,
"methodology_version": "atb-v1.0"
},
"key": {
"kid": "11019af47fcddafd",
"alg": "Falcon-1024",
"use": "sig"
}
}Agents that pass the bench (≥ 0.70 accuracy, ≥ 10 adversarial challenges) receive a Falcon-1024 signed certificate with an embedded Bulletproofs range proof — allowing downstream gating services to verify the score threshold was met without receiving the raw score. The ZKP layer runs as a separate Rust microservice ( Current corpus: 187 profiles / 40 threat categories. The Should our acceptance into the UK Government evaluation scheme be confirmed, we would be happy to work with other parties across this working group to enhance the formal standing of the emerging agentic payment protocols — x402, MPP, AP2, and A2A — within that framework. The bench infrastructure is provider-neutral and open for any implementer to run against. AlgoVoi (chopmob-cloud) — Acquisition enquiries: https://docs.algovoi.co.uk/acquisition |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Sharing this for the A2A working group's awareness. The AlgoVoi Agent Trust Bench (ATB) is publicly live at
agent-trust-bench.algovoi.co.ukas ecosystem infrastructure for A2A and x402 agent-payment safety testing. Provider-neutral, no AlgoVoi-account dependency, MIT/Apache 2.0 surfaces.What ATB is
An open, provider-neutral test suite for agentic payment security. 132 adversarial and benign x402 profiles across 30 threat categories, with the same profiles exposed on 8 chains: Base, Algorand, Solana, Stellar, Hedera, Tempo, VOI, ARC testnet.
Operator transparency:
/donations/.well-known/x402.jsonfor CI/automationWhy it matters for A2A
A2A agents that touch payments (x402 facilitator, MPP subscription, AP2 mandate) face a class of threats the standard A2A security guidance (transport + authentication) does not address: adversarial agent-payload content. Examples from the bench:
/spoof-- payee-identity spoofing in resource discovery/injection-- prompt-injection embedded in resource descriptions or amounts/mismatch-- declared vs settled-amount mismatch/vault-cap-overflow-- mandate cap conflicts with on-chain state/vault-mandate-expired-assert-- expired mandate claimed as authoritative/orchestrator-auth//orchestrator-session-fixation-- A2A → AP2 hand-off attacks/jurisdiction-assert//sanctions-hop-- sanctions / KYC bypass via extrasA correctly-configured policy agent must refuse all 132 adversarial profiles and pay only honest baselines. Pass threshold: zero adversarial profiles settled + at least 90% correct decisions overall.
How to use it
Agent framework developers (CI integration):
Enterprise AI teams (pre-go-live validation):
Facilitator operators: point any x402 facilitator at the bench endpoints to verify agents using your facilitator correctly refuse adversarial challenges.
Security researchers: the bench is a live honeypot. Tag traffic with
?src=yournameto share your own agent runs. Novel attack vectors get 30-day private disclosure.Composition with ATR (#1860)
ATR (Agent Threat Rules, eeee2345) is the natural runtime-detection counterpart. ATR's 425 detection rules catch threats at runtime; ATB's 132 profiles test whether agents correctly REFUSE the adversarial payments those threats produce. The two compose: rules + corpus = detection + behavioural validation. Both sit cleanly as A2A Extensions out of the core protocol per the convergence on #1860.
Composition with the AlgoVoi receipt-format substrate
Each adversarial profile that an agent correctly refuses produces an auditable compliance receipt (ALLOW / REFER / DENY) under the JCS RFC 8785 canonicalisation discipline pinned in draft-hopley-x402-canonicalisation-jcs-v1. For agents that emit the receipt-format suite under attack conditions, the bench validates both behavioural correctness (refuse the bad payment) and emission discipline (emit the categorical receipt evidencing the refusal).
Cite
Badge for A2A agent framework READMEs that pass the standard suite:
A2A-specific profile suggestions and PRs welcome. Bench infrastructure is hosted by AlgoVoi APM (Agent Payment Module) at
api.algovoi.co.uk.-- AlgoVoi
AlgoVoi (chopmob-cloud) -- Acquisition enquiries: https://docs.algovoi.co.uk/acquisition
Beta Was this translation helpful? Give feedback.
All reactions