A vulnerability harness for Claude Code, a set of security skills plus a Dockerized static-analysis backend that turn the agent into a security auditor.
Two skills and one service:
security-auditruns a fleet of parallel sub-agents through a six-phase pipeline (recon, hunting, validation, reporting, structured output, independent verification) to find exploitable vulnerabilities with real impact. It is a from-scratch rebuild of Cloudflare's Build your own vulnerability harness / security-audit-skill, mapped onto Claude Code's own primitives (theAgenttool,Explorefor research/verification roles,general-purposefor hunters that spawn sub-agents).joerndrives Joern and its Code Property Graph for structural and data-flow queries (CPGQL): taint tracking, source-to-sink reachability, and the queries.joern.io patterns.joern-gatewayis a standalone Joern server (server mode) plus a Python WebSocket gateway, running in Docker, that thejoernskill talks to through thejoernctlCLI.
The two skills compose: security-audit's hunting agents can use the joern
skill as a query engine to confirm taint and reachability.
The skill runs a structured audit in six phases:
- Recon: parallel
Exploreagents map the application's architecture, trust boundaries, and input surfaces. Producesarchitecture.md. - Hunt: parallel
general-purposeagents attack the codebase from different angles (injection, access control, business logic, cryptography, feature abuse, chained attacks, wildcard, and obvious-things). Each hunter can spawnExploresub-agents to dig deeper. - Validate: separate
Exploreagents try to disprove each finding. The agent that validates is never the agent that found it. This kills false positives. - Report: writes
REPORT.md(human-readable) andFINDINGS-DETAIL.md(detailed traces for MEDIUM+ findings). - Structured output: writes
findings.jsonconforming toreport-schema.json, checked byvalidate-findings.cjs. - Independent verification: fresh
Exploreagents verify every factual claim in the structured output against the actual source code.
Multiple runs against the same repo are additive. Each run explores different code paths; the skill reads prior findings.json files to skip known issues and target gaps.
The joern skill plugs into Hunt and Validate as a query engine for taint and reachability.
poki/
├── .claude-plugin/
│ └── plugin.json # Claude Code plugin manifest
├── skills/
│ ├── security-audit/
│ │ ├── SKILL.md # setup, principles, platform mapping, workflow, anti-patterns
│ │ ├── RECONNAISSANCE.md # Phase 1 recon prompts and synthesis
│ │ ├── HUNTING.md # Phase 2 orchestration and hunting methodology
│ │ ├── ATTACK-CLASSES.md # attack-class agent prompts (core + wildcard + obvious)
│ │ ├── VALIDATION-AND-REPORTING.md # Phases 3-6
│ │ ├── report-schema.json # JSON schema for findings.json
│ │ └── validate-findings.cjs # zero-dependency schema validator
│ └── joern/
│ ├── SKILL.md # how to drive Joern via the gateway; workflow
│ ├── CPGQL-REFERENCE.md # CPGQL language cheatsheet
│ └── QUERY-DATABASE.md # curated vuln queries (source→sink patterns)
└── joern-gateway/ # standalone Joern server + Python WS gateway (Docker)
├── Dockerfile # Temurin JDK + joern-cli + gateway venv
├── docker-compose.yml # publishes gateway on 127.0.0.1:8099
├── entrypoint.sh # starts joern --server, then the gateway
├── gateway.py # WebSocket bridge -> synchronous HTTP for the agent
├── joernctl # host CLI client (stdlib only), the agent runs this
└── workspace/ # mounted at /work; drop target code here
The agent runs joernctl on the host. It talks HTTP to gateway.py in the
container, which drives a standalone Joern server over its WebSocket protocol. Only
the gateway port (127.0.0.1:8099) is exposed; Joern stays inside the container.
cd joern-gateway
docker compose up -d --build # first build pulls the JDK + Joern (~1GB+)
./joernctl health # {"ok": true, "joern_connected": true, ...}
cp -r /path/to/target ./workspace/target
./joernctl import /work/target
./joernctl query 'cpg.call.name("system").reachableByFlows(cpg.parameter).p'See joern-gateway/README.md for the full protocol and
options.
Point Claude Code at the codebase you want to audit and ask for a security review:
security audit this codebase
find security vulnerabilities in ./src
do a security review, output to ~/audits/my-project
The security-audit skill activates on requests like security audit, find vulnerabilities, or pen-test the code. It asks for an output directory if you don't specify one, defaulting to ~/security-audit/<repo-name>/run-<N>.
- Claude Code with a model that supports tool use and parallel sub-agents (the
Agenttool) - Node.js, for
validate-findings.cjsschema validation in Phase 5 - Docker + Docker Compose, for the
joernskill's Joern backend
- Only report what you can exploit. Every finding needs a concrete attack scenario, not "an attacker could theoretically...".
- Adversarial validation. The agent that checks a finding is never the agent that found it.
- Severity requires impact. Likelihood × impact, not deviation from a checklist.
- Defense-in-depth gaps are not vulnerabilities. If Layer A prevents the attack, the absence of Layer B is a hardening note.
- Multiple runs improve coverage. A single run finds roughly half the vulnerabilities discovered across multiple runs.
The security-audit methodology and schema are a faithful reimplementation of Cloudflare's open-source security-audit-skill (MIT). See their blog post Build your own vulnerability harness for the background.
The joern skill builds on Joern (Apache-2.0) and its query database / queries.joern.io. The gateway speaks the scala-repl-pp server protocol that Joern's --server mode exposes.
MIT, see LICENSE.

