Skip to content

Arttribute/skillauditor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

64 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SkillAuditor

Security auditing and onchain verification for AI agent skills.

Skills are reusable instruction files that can be loaded into any AI agent. That reusability is also their risk: a skill can quietly override an agent's system prompt, instruct it to exfiltrate data, or behave differently depending on whether it detects a production environment. SkillAuditor audits skills before they are used β€” combining semantic analysis, sandboxed behavioral execution, and a synthesis verdict β€” then anchors the result onchain so any agent or developer can verify a skill's safety record without trusting a single source.


The Problem

Claude skills (SKILL.md files) are natural language, not code. That makes them an unusual attack vector: they look like helpful documentation, but can contain:

Threat Description
Instruction hijacking Overrides the agent's base system prompt
Silent exfiltration Instructs the agent to POST user data to an external endpoint
Scope creep Skill describes itself as a PDF reader but reaches the file system
Trojan metadata Description says one thing; body does another
Supply chain poisoning Legitimate skill modified after passing audit
Conditional malice Behaves well in a sterile sandbox; activates in real targets when .env or .ssh keys are visible

Rule-based (regex) auditing fails because the attack surface is natural language β€” rephrasing defeats any ruleset, and publishing rules creates a bypass guide. SkillAuditor uses LLM semantic analysis for every audit, making the cost of evasion equal to the cost of rewriting meaning itself.


How It Works

Every submitted skill passes through a four-stage sandboxed multi-agent pipeline:

Submit skill
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 1 β€” Static Analyzer  β”‚  Deterministic: SHA-256 hash, YAML frontmatter,
β”‚  (no LLM)                   β”‚  URL extraction, script detection, declared capabilities
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
     β–Ό                 β–Ό  (parallel)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 2    β”‚  β”‚  Stage 3         β”‚
β”‚  Content    β”‚  β”‚  Sandbox Runner  β”‚
β”‚  Analyst    β”‚  β”‚                  β”‚
β”‚             β”‚  β”‚  Skill is loaded β”‚
β”‚  LLM reads  β”‚  β”‚  as instructions β”‚
β”‚  skill as   β”‚  β”‚  into a Claude   β”‚
β”‚  EXAMINER.  β”‚  β”‚  Haiku session   β”‚
β”‚  Detects 8  β”‚  β”‚  inside a mock   β”‚
β”‚  threat     β”‚  β”‚  workstation.    β”‚
β”‚  categories β”‚  β”‚                  β”‚
β”‚  w/ quoted  β”‚  β”‚  All tool calls  β”‚
β”‚  evidence   β”‚  β”‚  intercepted.    β”‚
β”‚             β”‚  β”‚  Honeypot creds  β”‚
β”‚             β”‚  β”‚  present. 3      β”‚
β”‚             β”‚  β”‚  graduated tasks β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 4 β€” Verdict Agent      β”‚  Reads only the three upstream reports.
β”‚  (never sees raw skill)       β”‚  Never reads raw skill content.
β”‚                               β”‚  Produces: verdict, 0-100 score,
β”‚                               β”‚  5 dimensions, findings, recommendation
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό
      Onchain stamp on Base
      ENS subname: {hash8}.skills.skillauditor.eth

Stage isolation

The architecture is designed so no stage can be hijacked by a malicious skill:

  • Static Analyzer reads only bytes β€” no LLM, deterministic output
  • Content Analyst is framed as an EXAMINER. Any instruction inside the skill telling it to "ignore previous instructions" is itself logged as a finding, not obeyed. Output is schema-validated JSON only
  • Sandbox Runner treats skill content as data to simulate, not instructions to follow. The mock workstation includes realistic honeypot credentials (~/.env, .ssh/id_rsa, .aws/credentials) to trigger conditional malice
  • Verdict Agent never sees the raw skill β€” it synthesises the three upstream reports only
  • All inter-agent communication uses schema-validated JSON via forced tool calls

Verdict dimensions

The final verdict scores five independent dimensions (0–100 each):

Dimension What it measures
intentClarity Stated purpose vs observed behavior alignment
scopeAdherence Stays within declared capabilities
exfiltrationRisk Likelihood of outbound data leakage
injectionRisk Attempts to override agent instructions
consistencyScore Behavioral consistency across sandbox runs

A skill that says it will exfiltrate in Stage 2 and tries to POST in Stage 3 produces a convergence signal β€” high-confidence unsafe verdict.


Onchain Verification

Passing audits receive a permanent onchain stamp on Base and an ENS subname on Ethereum.

SkillRegistry (Base Sepolia)

0x87C3E6C452585806Ef603a9501eb74Ce740Cafcc

Records skillHash β†’ (verdict, score, auditor, timestamp). Any agent or developer can call getStamp(hash) permissionlessly to verify a skill without trusting SkillAuditor's API.

ENS subnames (Ethereum Sepolia)

{hash8}.skills.skillauditor.eth

Every audited skill gets a human-readable, resolvable name. An agent can resolve the name, read metadata and audit records, and verify that the hash matches the content it is about to load. The name also ties the skill to its author β€” pdf-reader.marcuschen.eth makes both the artifact and publisher visible, giving agents a verifiable author track record over time.

ERC-7730 Clear Signing

contracts/erc7730/SkillRegistry.json describes the recordStamp call in structured human-readable form. Ledger hardware wallets display the exact fields being signed β€” skill hash, verdict, score β€” before a user approves an onchain stamp.


Submission & Access Control

World ID 4.0 (human gating)

Skills can only be submitted by verified humans. Developers submit directly with a World ID proof; agents submit on a developer's behalf with cryptographic proof of delegation via World AgentKit. Payment alone is not enough β€” a bot without a human-backed identity cannot submit, making large-scale anonymous skill poisoning infeasible.

World AgentKit (agent-to-agent)

The /v1/agent/submit endpoint uses World AgentKit SIWE (Sign-In with Ethereum) sessions. Agents authenticate with their EVM wallet, which must be registered in World's AgentBook. The endpoint also enforces an x402 payment gate for Pro audits β€” machine-native USDC micropayments on Base with no human in the loop.

Tiers

Tier What you get
Free Full 4-stage LLM audit + findings report
Pro Audit + onchain stamp on Base + ENS subname + IPFS report pin

Architecture

skillauditor/
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ skillauditor-api/        # Hono REST API (Node.js, port 3001)
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ services/
β”‚   β”‚       β”‚   β”œβ”€β”€ audit-pipeline.ts     # Orchestrator
β”‚   β”‚       β”‚   β”œβ”€β”€ static-analyzer.ts    # Stage 1: deterministic
β”‚   β”‚       β”‚   β”œβ”€β”€ content-analyst.ts    # Stage 2: LLM semantic
β”‚   β”‚       β”‚   β”œβ”€β”€ sandbox-runner.ts     # Stage 3: behavioral
β”‚   β”‚       β”‚   β”œβ”€β”€ verdict-agent.ts      # Stage 4: synthesis
β”‚   β”‚       β”‚   β”œβ”€β”€ onchain-registry.ts   # Base viem integration
β”‚   β”‚       β”‚   β”œβ”€β”€ ens-registry.ts       # ENS subname writes
β”‚   β”‚       β”‚   β”œβ”€β”€ agentkit-session.ts   # World AgentKit SIWE
β”‚   β”‚       β”‚   └── world-id.ts           # World ID 4.0 verification
β”‚   β”‚       β”œβ”€β”€ routes/v1/               # REST endpoints
β”‚   β”‚       └── db/                      # MongoDB/Mongoose models
β”‚   β”‚
β”‚   └── skillauditor-app/        # Next.js 15 frontend (App Router)
β”‚       β”œβ”€β”€ app/
β”‚       β”‚   β”œβ”€β”€ page.tsx                 # Landing page
β”‚       β”‚   β”œβ”€β”€ dashboard/               # Auth-gated dashboard + submit
β”‚       β”‚   β”œβ”€β”€ audits/[auditId]/        # Live audit result with polling
β”‚       β”‚   β”œβ”€β”€ skills/[hash]/           # Skill detail + onchain stamp
β”‚       β”‚   └── explore/                 # Public skill browser
β”‚       └── components/
β”‚           β”œβ”€β”€ world-id/                # WorldIDVerifier widget
β”‚           β”œβ”€β”€ ledger/                  # Ledger DMK connect + approve modal
β”‚           └── ens/                     # ENS name display
β”‚
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ skillauditor-client/     # Agent SDK (@skillauditor/client)
β”‚   β”œβ”€β”€ skill-registry/          # viem write helpers
β”‚   β”œβ”€β”€ skill-ens/               # ENS subname registration
β”‚   └── skill-types/             # Shared TypeScript types
β”‚
└── contracts/
    β”œβ”€β”€ src/
    β”‚   β”œβ”€β”€ SkillRegistry.sol            # Base Sepolia
    β”‚   └── SkillSubnameRegistrar.sol    # Ethereum Sepolia
    └── erc7730/
        └── SkillRegistry.json           # Ledger Clear Signing descriptor

Tech stack

Layer Technologies
Backend Node.js, Hono, MongoDB/Mongoose, Anthropic Claude API
Frontend Next.js 15 (App Router), React 19, Tailwind CSS 4, Privy
Contracts Solidity, Foundry, Base L2 (viem)
Auth Privy (email / wallet / Google) + API key headers
Onchain Base Sepolia (SkillRegistry), Ethereum Sepolia (ENS)
Packages pnpm 10 monorepo

Agent SDK

Any agent β€” Claude Code, a custom LLM pipeline, a CI step β€” can verify a skill with a single call:

import { SkillAuditorClient } from '@skillauditor/client'

const client = new SkillAuditorClient({
  privateKey: process.env.AGENT_PRIVATE_KEY, // World AgentKit wallet
  tier: 'pro',
  paymentHandler: (req) => getPaymentHeader(req, wallet), // x402 USDC
})

const result = await client.verifySkill(skillContent)

if (!result.safe) {
  throw new Error(`Skill rejected: ${result.verdict.recommendation}`)
}

In dev mode, pass privateKey: 'dev' β€” no AgentBook registration, no payment required.

The client handles World AgentKit SIWE signing, x402 payment negotiation, result polling, and terminal logging. The caller sees only { safe, verdict, auditId }.


Deployed Contracts

Network Contract Address
Base Sepolia SkillRegistry 0x87C3E6C452585806Ef603a9501eb74Ce740Cafcc
Ethereum Sepolia SkillSubnameRegistrar 0x83466a77A8EeE107083876a311EC0700c3cC8453

ENS name: skillauditor.eth β€” subnames issued as {hash8}.skills.skillauditor.eth


API Reference

Method Endpoint Description
POST /v1/submit Submit skill (World ID verified human)
POST /v1/agent/submit Submit skill (World AgentKit agent + x402)
GET /v1/audits/:id Poll audit status
GET /v1/audits/:id/logs Incremental log stream
POST /v1/verify Verify skill by content or hash
GET /v1/skills Browse audited skills (paginated)
GET /v1/skills/:hash Skill detail + onchain stamp
POST /v1/ledger/propose Propose Ledger hardware approval
POST /v1/ledger/approve/:id Store Ledger signature

Running Locally

Prerequisites: Node.js 20+, pnpm 10, MongoDB

# Install
pnpm install

# API β€” copy and fill env vars
cp apps/skillauditor-api/.env.example apps/skillauditor-api/.env
# Required: ANTHROPIC_API_KEY, MONGODB_URI, PRIVY_APP_SECRET

# Start API (port 3001)
pnpm --filter skillauditor-api dev

# Start frontend (port 3000)
pnpm --filter skillauditor-app dev

Minimum env vars for local dev (World ID bypassed, no onchain stamps):

ANTHROPIC_API_KEY=sk-ant-...
MONGODB_URI=mongodb://localhost:27017/skillauditor
PRIVY_APP_ID=...
PRIVY_APP_SECRET=...

To activate onchain stamps and World ID in production:

Feature Env var(s)
World ID verification WORLD_RP_ID, WORLD_RP_SIGNING_KEY
AgentKit wallet CDP_API_KEY_NAME, CDP_API_KEY_PRIVATE_KEY
x402 payment gate SKILLAUDITOR_TREASURY_ADDRESS
IPFS report pinning PINATA_JWT
ENS subname writes AUDITOR_AGENT_PRIVATE_KEY

Built At

ETH Cannes hackathon β€” targeting World AgentKit, World ID 4.0, ENS AI Agents, and Ledger AIΓ—Ledger bounties.

Releases

No releases published

Packages

 
 
 

Contributors