Autonomous web application security testing agent powered by Claude.
Wreck-It Ralph orchestrates Claude CLI with browser automation (Playwright MCP) to methodically test web applications for security vulnerabilities. It runs in iterations — each one a full Claude session that picks up where the last left off — with hook-based enforcement of scope, rate limits, and safety controls.
flowchart TD
A["@targets.md + SECURITY_BRIEF.md"] --> B["Wreck-It Ralph Orchestrator"]
B --> C["Claude CLI + Playwright Browser"]
C --> D{"Testing Phase"}
D --> E["Reconnaissance"]
D --> F["Auth Testing"]
D --> G["Input Validation"]
D --> H["Access Control"]
D --> I["Business Logic"]
D --> J["API Security"]
E & F & G & H & I & J --> K["WRECK_STATUS + WRECK_FINDING + WRECK_LEARNED"]
K --> L{"More phases?"}
L -- Yes --> M["Next Iteration"]
M --> C
L -- No --> N["HTML + Markdown Reports"]
subgraph Hooks ["Safety Hooks (enforce on every action)"]
direction LR
S1["Scope Enforcer"]
S2["Rate Limiter"]
S3["Payload Validator"]
S4["Stop Validator"]
end
C -. "every tool call" .-> Hooks
Hooks -. "block or allow" .-> C
subgraph Memory ["Persisted Across Iterations"]
direction LR
M1["Learned Skills"]
M2["Findings"]
M3["Checkpoints"]
M4["Scope Learning"]
end
K --> Memory
Memory --> B
- Phase-based testing — Reconnaissance, Authentication, Input Validation, Access Control, Business Logic, API Security
- Iteration continuity — Context injected at each iteration start so Claude knows what was done, what's left, and what failed
- Checkpoint recovery — Crash mid-run? Resume from the last completed iteration
- Empty iteration detection — Exponential backoff when Claude gets stuck, auto-stops after prolonged stalling
- Define multiple related targets (e.g., frontend + API) in one
@targets.md - Each target has its own scope, auth config, and type (
WebApplication,Api,SinglePageApp,MobileBackend) - Targets can declare dependencies (
DependsOn) for cross-target testing (CORS, token leakage) - Scope patterns are combined across all targets for the enforcer hooks
Hooks are Node.js scripts that block Claude's actions until requirements are met. They are not prompt instructions — they are enforcement mechanisms.
| Hook | What It Does |
|---|---|
scope-enforcer.mjs |
Blocks navigation to out-of-scope URLs |
rate-limiter.mjs |
Enforces requests-per-minute limit |
payload-validator.mjs |
Blocks destructive payloads (DROP TABLE, rm -rf, etc.) |
stop-validator.mjs |
Blocks output unless WRECK_STATUS block is present and valid |
file-validator.mjs |
Prevents writes to wrong files |
session-start.mjs |
Injects iteration context, skills, and blocked ops history |
activity-tracker.mjs |
Logs all tool use for audit trail |
Claude accumulates knowledge across iterations:
- Claude-reported skills — Claude emits
WRECK_LEARNEDblocks when it discovers target-specific patterns (WAF behavior, auth quirks, API conventions) - Auto-generated failure skills — Repeated blocked operations automatically become skills so Claude stops retrying the same mistakes
- Confidence decay — Unused skills fade over time; frequently referenced skills get boosted
- Deduplication — Existing skills are shown to Claude with content previews to prevent redundant reports
- Deduplication — Hash-based (URL + param + category + payload) and normalized title matching
- Verification — Optional re-test of high-severity findings for confirmation
- Evidence capture — HTTP request/response pairs and screenshots stored per finding
- OWASP/CWE/WSTG mapping — Findings tagged with industry-standard identifiers
- Tracks repeatedly blocked hosts and suggests scope additions
- Classifies blocked URLs by type (API endpoints, CDN, third-party services)
- Saves suggestions to
logs/scope-learning/scope-suggestions.md
- HTML report — Styled, self-contained report with finding details, severity breakdown, and evidence
- Markdown report — Same content in plain text for version control or further processing
- Generated automatically at session end (even on Ctrl+C)
- Interactive setup — Run with no arguments for a guided configuration wizard
- System tray icon — Shows progress, current phase, finding count (Windows)
- Audio notifications — Sounds for startup, iteration complete, finding discovered, errors
- Toast notifications — Windows notifications for completion and errors
- Headless mode — Run Playwright without a visible browser window
- Temp email accounts — Auto-create test accounts via temporary email services for authenticated testing
- Reconnaissance artifacts — Network captures, page snapshots, and screenshots preserved for review
# Build
dotnet build
# Run with no arguments for interactive setup
dotnet run --project src/WreckItRalph
# Or specify options directly
dotnet run --project src/WreckItRalph -- --targets @targets.md --brief SECURITY_BRIEF.md
# Validate configuration without running
dotnet run --project src/WreckItRalph -- --dry-run
# Publish self-contained binary
dotnet publish -c Release -r win-x64wreck [options]
Options:
-t, --targets <file> Targets file (default: @targets.md)
-b, --brief <file> Security brief (default: SECURITY_BRIEF.md)
-m, --max-iterations <n> Max iterations (default: 50)
-d, --delay <seconds> Delay between iterations (default: 5)
--timeout <minutes> Timeout per iteration (default: 30)
--rate-limit <rpm> Requests per minute (default: 30)
--no-verify Skip finding verification
--report-dir <dir> Report output directory (default: reports)
-c, --config <file> Config file (default: wreck.json)
-s, --safe-mode Use cmd.exe without streaming output
--model <name> Claude model to use
--api-key <key> API key for the model provider
-v, --verbose Show detailed output
--no-hooks Disable safety hooks
--headless Run browser in headless mode
--dry-run Validate config only
Defines testing scope, authentication, and phases.
Single target:
# Security Testing Scope
## Target
- Name: My Application
- Base URL: https://app.example.com
- Type: WebApplication
## Authentication
- Type: FormLogin
- Login URL: /login
## In-Scope
- https://app.example.com/**
## Out-of-Scope
- https://app.example.com/admin/**
## Testing Phases
- [ ] Reconnaissance
- [ ] Authentication Testing
- [ ] Input Validation (XSS, SQLi)
- [ ] Access Control (IDOR)
- [ ] Business Logic
- [ ] API SecurityMulti-target:
# Security Testing Scope
## Target
- Name: Frontend
- Base URL: https://app.example.com
- Type: SinglePageApp
- Primary: true
## Target
- Name: API
- Base URL: https://api.example.com
- Type: Api
- DependsOn: Frontend
## In-Scope
- https://app.example.com/**
- https://api.example.com/**
## Testing Phases
- [ ] Reconnaissance
- [ ] Authentication Testing
- [ ] Input Validation (XSS, SQLi)
- [ ] Access Control (IDOR)
- [ ] Cross-Origin TestingTesting instructions and methodology for Claude. Describes the target application, known features, areas of concern, and any special testing requirements.
JSON configuration file for hook settings and other options:
{
"hooksConfig": {
"scopeEnforcement": true,
"rateLimiting": true,
"blockDestructive": true,
"activityTracking": true,
"contextInjection": true
}
}Claude reports status at the end of each iteration:
---WRECK_STATUS---
{"phase":"RECONNAISSANCE","status":"IN_PROGRESS","newFindings":0,"highestSeverity":"NONE","endpointsTested":5,"endpointsDiscovered":10,"exitSignal":false,"recommendation":"Continue scanning"}
---END_WRECK_STATUS---
Findings are reported inline:
---WRECK_FINDING---
{"title":"Reflected XSS in Search","severity":"HIGH","category":"XSS","url":"https://target.com/search","parameter":"q","payload":"<script>alert(1)</script>","description":"User input reflected without encoding","evidence":"Response contains unescaped payload","reproduction":"Navigate to /search, enter payload","recommendation":"HTML-encode output","cwe":"CWE-79","owasp":"A03:2021","wstg":"WSTG-INPV-01","confidence":0.9}
---END_WRECK_FINDING---
Learned skills are reported when Claude discovers reusable target-specific knowledge:
---WRECK_LEARNED---
{"skillName":"waf-blocks-inline-scripts","skillDescription":"WAF blocks script tags but allows event handlers","skillContent":"Use onerror/onload event handlers instead of <script> tags for XSS testing"}
---END_WRECK_LEARNED---
When running, the tool creates:
wreck-hooks/— Generated Node.js hook scripts.claude/settings.local.json— Hook configuration for Claude CLIlogs/— Iteration logs,context-input.json, blocked operations, learned skillsreports/— Generated HTML and Markdown security reportsevidence/— HTTP evidence and screenshots for findingsrecon/— Reconnaissance artifacts (network captures, snapshots)attack-surface.md— Created by Claude during reconnaissance
- .NET 10.0 SDK
- Claude CLI (claude.ai/code)
- Node.js (for hook scripts and Playwright MCP server)
This tool is for authorized security testing only. You must have explicit written permission to test any target application. Unauthorized security testing is illegal in most jurisdictions.
Uses --dangerously-skip-permissions. Wreck-It Ralph runs Claude CLI with this flag to enable autonomous operation. This gives Claude unrestricted tool access within the session. The safety hooks provide guardrails, but they are not a security boundary — they are best-effort enforcement.
Scope enforcement is not airtight. Hooks validate URL patterns and payload regex, but edge cases exist. This tool assists authorized testing; it does not guarantee confinement.
Each iteration consumes Claude API credits. A typical 15-iteration run involves 15 full Claude sessions with browser automation. Monitor your usage.
Check Anthropic's acceptable use policy before using this tool for automated security testing via Claude CLI.
src/WreckItRalph/
├── Program.cs # CLI entry point + interactive setup
├── Config/ # WreckOptions, HooksConfig
├── Models/ # Target, Finding, WreckStatusBlock
├── Orchestration/ # Main testing loop
├── Services/ # Status parsing, findings, logging, evidence
├── Hooks/
│ ├── SafetyHookManager.cs # Hook script generation + context injection
│ ├── Scripts/ # Embedded Node.js hook scripts
│ └── Skills/ # Learned skills manager (CRUD, decay, usage)
├── Reporting/ # HTML + Markdown report generation
├── Tray/ # System tray icon + notifications
├── Setup/ # Interactive setup wizard + templates
└── Output/ # Console output formatting
tests/WreckItRalph.Tests/ # xUnit tests
MIT
