Install the plugin. Run AnyHarness. Let it learn your project and generate a project-specific engineering harness for AI-assisted development.
AnyHarness v3 is a skill-first adaptive harness. It is not an npx-first CLI and it is not a static checklist library. The normal user path is:
Install plugin → run AnyHarness → scan repo → confirm domain details → generate project harness
The public surface is intentionally small:
Use AnyHarness for this repository.
In Claude Code, the installed plugin may expose the namespaced skill as:
/anyharness:run
In Codex, you can use natural language:
Use AnyHarness to adopt this repository.
The original problem is not only that AI-generated code needs a generic review checklist. The harder problem is that every project has different domain risks:
- a low-latency C++ market-data or trading service
- an Electron desktop client
- a Java e-commerce backend
- an AI agent platform
- a payment system
- an internal admin tool
Generic guardrails are useful, but they are not enough. AnyHarness v3 derives a Project Harness Profile from repository evidence and user confirmation.
Skills reason.
Scripts assist.
Optional hooks enforce.
What makes AnyHarness more than a generator is its feedback loop. A static CLAUDE.md rots; AnyHarness is structured as three connected loops:
┌─────────┐ ┌────────┐ ┌────────┐
│ INIT │ ─► │ REVIEW │ ─► │ EVOLVE │ ──┐
│ (adopt) │ │ (diff) │ │ │ │
└─────────┘ └────────┘ └────────┘ │
▲ │
└───────────────────────┘
profile gets sharper
- Init bootstraps a project harness profile from repository evidence and user answers.
- Review uses the profile to find Blockers, Needs Changes, and Learning Candidates.
- Evolve turns confirmed Learning Candidates into permanent invariants, with provenance and a
learningHistoryledger.
Without the evolve loop, AnyHarness is a snapshot. With it, the harness becomes a learning system — every review can make the next one sharper.
scan-project.mjs only sees filenames and directories — it cannot see the
architecture. The deep analysis path is what real harness engineering looks
like.
node analyze.mjs --stack auto --path <project-dir>--stack auto detects the technology stack from project files and takes the right
path automatically. Add --save to write a full JSON report to
.anyharness/reports/.
Under the hood, it runs:
analyze.mjs → extract-architecture.mjs → derive-risk-topology.mjs → propose-evolution.mjs
(entry) (parse source) (derive risk boundaries) (write to profile)
| Path | When | What runs |
|---|---|---|
| A — deterministic | Stack is java-spring, rust-tauri, csharp-avalonia, or cpp-sdk | Built-in extractor + topology rules → exact file:line findings |
| B — LLM analysis | Any other stack | File sampler picks high-signal files → you read them + apply llm-extractor.md |
| C — user config | .anyharness/stack-config.json present |
Your regex patterns + universal topology rules → deterministic |
B→C upgrade: after Path B analysis, run suggest-stack-config.mjs to generate
a starter stack-config.json for your language. Edit it, save it, and next time
analyze.mjs runs deterministically without LLM file reading.
node suggest-stack-config.mjs --path <dir> --save # draft to .anyharness/drafts/
node suggest-stack-config.mjs --path <dir> --confirm # activate Path CAnyHarness ships a complete pipeline (extractor + topology rules + knowledge pack) for 4 technology stacks:
Parses .java sources to find controllers, services, repositories, @Transactional
methods, Kafka send/listener bindings, external HTTP calls, self-invocations, @Query
mutations. Detects:
- state-mutation-safety: dual-write (DB + Kafka in same
@Transactional);this.foo()bypasses Spring proxy; Kafka at-least-once without idempotency - missing-modifying:
@QueryUPDATE/DELETE without@Modifying→ silent runtime failure (blocker) - resource-lifetime:
REQUIRES_NEWunder load can exhaust the connection pool - external-interaction: HTTP calls without visible timeout / retry / circuit-breaker
- trust-boundary: HTTP endpoints accepting input without
@Valid
Parses .rs sources to find #[tauri::command] functions, generate_handler![]
registrations, unsafe {} blocks, std::fs/std::process calls, tokio::spawn.
Detects:
- trust-boundary (blocker):
unsafeblock inside a registered Tauri command — renderer JS can trigger arbitrary native memory access - trust-boundary: unregistered
#[tauri::command](dead code or plugin route);fs::read_to_stringwith renderer-supplied path (path traversal) - external-interaction:
Command::new("sh").arg("-c").arg(&user_input)command injection - resource-lifetime:
tokio::spawncapturing a raw handle; async command with no cancellation path
Parses .cs sources to find async void, ObservableCollection cross-thread writes,
HttpClient creation patterns, Process.Start, [DllImport]/[LibraryImport],
unsafe blocks, IDisposable fields.
Detects:
- error-propagation:
async void— exceptions propagate toSynchronizationContextand crash the app - threading-discipline:
ObservableCollection.Add/ClearinsideTask.Run— cross-thread mutation crashes UI - resource-lifetime:
new HttpClient()per call (socket exhaustion);IDisposablefield in non-IDisposableclass - trust-boundary:
Process.Start(UseShellExecute=true, fileName=userInput)→ OS picks handler; P/Invoke with unchecked marshaling
Parses .h/.cpp sources to find public API signatures (raw pointer+length, void* ctx,
char* returns), memcpy/sprintf/strcpy, new/delete, std::thread
create/detach/join, global mutable state.
Detects:
- trust-boundary (blocker):
memcpy(out, data, len)with nolen <= out_lencheck → heap buffer overflow - trust-boundary:
sprintfwithoutsnprintf; public API accepting raw pointer+size - api-stability:
char*return with ambiguous ownership (who frees?);void* ctxcallback with no lifetime contract - resource-lifetime:
std::thread::detach()→ orphan thread, use-after-free on captured pointer; rawnew/deleteinstead of RAII - threading-discipline: data race on shared flag (no
std::atomic); global state with undefined lock ordering
Every finding includes a file:line citation, severity (blocker/high/medium/low),
and a pre-formatted Learning Candidate ready to feed into the evolve loop.
This is what separates AnyHarness from a generic review checklist: structured source extraction + stack-specific knowledge produces findings about this kind of project's real failure modes, not generic style issues.
Extractors are regex-based (PoC quality) and replaceable with tree-sitter / Roslyn / libclang without changing the downstream contract. Adding a new stack requires one extractor module + one topology module + one knowledge pack. See
references/probe-architecture.mdfor the contract.
AnyHarness works on any stack — not just the four listed above.
Path B (LLM analysis): analyze.mjs --stack auto detects 15+ stacks. For
unsupported ones, it samples the most relevant source files and prints them with
guidance from references/llm-extractor.md. You read the files and apply the
7 universal failure modes to produce Risk[] findings.
Path C (deterministic config): drop .anyharness/stack-config.json in your
project root — no code required. Define regex patterns for your stack's:
trustBoundaryMarkers— route decorators / annotationsexternalCallPatterns— subprocess, HTTP, and file I/O callsunsafePatterns— dangerous operationsasyncPatterns— async function formserrorSwallowPatterns— silent error swallowing
analyze.mjs --stack auto picks up the config automatically. See
references/stack-config-schema.md for the full schema and example configs for
Python/FastAPI, Go/Gin, and Node/Express.
See references/probe-architecture.md, references/universal-failure-modes.md, and
references/stacks/<stack>.md.
AnyHarness uses the LLM where it is strongest:
- reading project context
- discovering domain signals
- asking focused questions
- synthesizing project-specific rules
- creating expert review roles
- designing gates and test oracles
- generating cross-model review packets
Optional skill scripts handle deterministic support tasks:
analyze.mjs— unified architecture analysis pipeline (extract + topology + report)suggest-stack-config.mjs— generate a starterstack-config.jsonfor Path Cscan-project.mjs— repository file scancollect-diff.mjs— git diff collectionextract-architecture.mjs— per-stack source extractionderive-risk-topology.mjs— risk topology from extraction outputsample-for-llm.mjs— ranked file sampling for unsupported stacks (Path B)write-native-prompts.mjs— generate CLAUDE.md / AGENTS.md / Cursor ruleswrite-profile.mjs— write or draft the project harness profilevalidate-profile.mjs— validate profile JSONgenerate-review-packet.mjs— cross-model review packetpropose-evolution.mjs— merge learning candidates into profileinstall-local-hooks.mjs— optional Git hooks and CI workflow
No global CLI is required for normal usage.
By default, AnyHarness writes only native AI prompt surfaces after confirmation:
CLAUDE.md # Claude Code project instructions
AGENTS.md # Codex and agent instructions
.cursor/rules/anyharness.mdc # optional Cursor rule
If you enable Project Harness mode, it also writes:
.anyharness/
profile.json # machine-readable project harness profile
# (carries learningHistory ledger after each evolution)
profile.md # human-readable project harness profile
drafts/ # safe drafts before --confirm writes
gates/ # gate artifacts
packets/ # cross-model review packets
evidence/ # test/review evidence, if generated
If you enable hard enforcement, it can generate repo-local files:
.anyharness/scripts/check.mjs
.githooks/pre-commit
.githooks/commit-msg
.github/workflows/anyharness.yml
These are generated only after explicit confirmation.
- Claude Code installed (CLI or desktop app)
- Node.js 18 or later
- This repository cloned locally:
git clone https://github.com/doublnt/ai-harness-workflow.git ~/anyharness
# or wherever you preferOpen (or create) your Claude Code user settings file:
# macOS / Linux
~/.claude/settings.json
# Windows
%APPDATA%\Claude\settings.jsonAdd the plugins section (merge with existing content if the file already exists):
{
"plugins": {
"marketplaces": [
{
"url": "file:///Users/yourname/anyharness/.claude-plugin/marketplace.json"
}
]
}
}Replace
/Users/yourname/anyharnesswith the actual path where you cloned this repo. You can verify the path withpwdinside the cloned directory.
In Claude Code, run:
/plugins install anyharness
Or open the plugin marketplace UI, find anyharness, and click Install.
/anyharness:run
You should see AnyHarness respond and ask what you'd like to do.
Open (or cd into) the project you want to analyze. Then:
Adopt an existing project:
/anyharness:run adopt this repository safely
Initialize a new project:
/anyharness:run initialize this new project
Review staged changes:
/anyharness:run review the current staged diff
Generate a cross-model review packet:
/anyharness:run create a security review packet for the staged diff
AnyHarness will guide you through the rest interactively — scanning, asking questions, and confirming before writing any files.
- Codex with plugin support enabled
- This repository cloned or accessible locally
In your Codex configuration, add:
{
"plugins": {
"marketplaces": [
{
"url": "file:///Users/yourname/anyharness/.agents/plugins/marketplace.json"
}
]
}
}Install the anyharness plugin.
Use AnyHarness to adopt this repository safely.
Then continue the conversation:
Use AnyHarness to generate project-specific expert review roles.
Use AnyHarness to review this diff against the project harness.
Use AnyHarness to create a cross-model review packet.
When you ask AnyHarness to adopt or initialize a project, it follows this sequence without writing anything until you confirm:
- Scan — reads your project files, detects stack and AI workflow files
- Hypothesize — proposes domain signals with evidence and confidence levels
- Ask — poses 5–12 focused questions about your project's specific rules
- Propose — shows what files it would create (CLAUDE.md, profile.json, etc.)
- Write — only writes after you confirm each step
Example first-run output:
Scan complete. 847 files scanned.
Stack: Node.js, React, PostgreSQL
AI workflow: CLAUDE.md detected
Domain hypotheses:
- ecommerce/payment: medium confidence
Evidence: src/payment/, src/orders/, docs/checkout.md
Unknowns:
- Whether payment callbacks can repeat
- How inventory reservation works
Questions:
1. Can payment callbacks be delivered more than once?
2. Is order price frozen at checkout or at payment time?
3. Is inventory reserved immediately or only after payment?
(Reply to answer. I won't write anything until you confirm.)
Ask:
Use AnyHarness to initialize this new project.
AnyHarness will:
- perform a read-only scan
- detect AI workflow files such as
CLAUDE.md,AGENTS.md,.cursor/rules - detect stack signals such as Java, C++, Rust, TypeScript, Electron, React, Spring, etc.
- detect domain hypotheses from code, docs, routes, schema, tests, and names
- ask focused questions
- generate native prompt surfaces
- generate a project-specific harness profile
- offer optional local enforcement
Ask:
/anyharness:run onboard this existing repository
This runs onboard.mjs — a single command that combines project scan +
architecture analysis, then writes the profile seeded with real risk findings
in one confirmation step.
What it does:
- Scan + analyze together — reads directory structure, detects stacks and domain signals, then immediately runs deep architecture extraction on the source code.
- Combined presentation — shows domain hypotheses and architecture risk
findings (with
file:linecitations) side by side, rather than as separate steps. - Fewer questions — the architecture analysis already answers some domain questions; AnyHarness only asks what it couldn't infer from the code.
- Single write — after you confirm, writes
CLAUDE.md,AGENTS.md, and.anyharness/profile.jsonseeded with both domain invariants and risk-derived invariants in one shot.
Safety rules (unchanged):
- read-only scan and analysis first — nothing written until you confirm
- existing
CLAUDE.md/AGENTS.mdare never overwritten; drafts are generated - hooks are not installed unless you explicitly request them
AnyHarness does not ship authoritative domain packs. Instead, it produces domain hypotheses.
Example output:
Domain hypotheses:
- ecommerce/payment: confidence medium
- inventory consistency: confidence medium
Evidence:
- src/payment/PaymentCallbackController.java
- src/order/OrderService.java
- migrations/create_inventory_reservations.sql
- docs/checkout.md
Unknowns:
- whether payment callbacks can repeat
- whether inventory is reserved or deducted immediately
- where order state transitions are defined
Then it asks focused questions:
1. Can payment callbacks be delivered more than once?
2. Is order final price frozen at order creation?
3. Is inventory reserved at checkout or deducted at payment success?
4. Does fulfillment happen immediately after payment success?
Only after user confirmation does it synthesize project rules.
AnyHarness creates project-specific roles from the project harness profile.
Examples:
Payment Idempotency Reviewer
Inventory Consistency Reviewer
Electron IPC Boundary Reviewer
Low-Latency C++ Reviewer
Order State Machine Reviewer
Architecture Trade-off Reviewer
Performance and Memory Reviewer
Release Readiness Reviewer
The roles are not just labels. Each one includes:
- scope
- required context
- project-specific invariants
- blocker criteria
- required evidence
- output schema
A review packet solves the common problem: another model reviews code without enough context.
Ask:
Use AnyHarness to create a security review packet for the staged diff.
Generated packet:
.anyharness/packets/<id>/
PROMPT.md
PROJECT_PROFILE.md
DIFF.patch
CHANGED_FILES.txt
RELEVANT_FILES.md
GATE_REQUIREMENTS.md
DOMAIN_INVARIANTS.md
UNKNOWN.md
You can give that packet to another model and ask it to perform one expert role only.
Every review ends with a Learning Candidates section: structured proposals to update the project harness based on what the review found. This is the mechanism that keeps the profile alive.
Verdict: Blocked
Learning Candidates:
- type: new-invariant
proposed: Webhook handlers under src/webhooks/ must look up the idempotency key
in payment_events before any side effect.
evidence: src/webhooks/PaymentWebhook.java:42, src/webhooks/RefundWebhook.java:31
rationale: Two handlers already exhibit the missing check.
Apply any of these to the profile?
Candidate types:
new-invariant— a rule the project should always followrefined-invariant— sharpen the wording or scope of an existing invariantretired-invariant— remove an invariant that no longer appliesnew-unknown— a question the reviewer couldn't answer without more contextnew-gate— a check that should run automatically on every change
When you confirm, AnyHarness merges accepted candidates into
.anyharness/profile.json and appends a timestamped entry to learningHistory,
including the trigger, what was added, refined, retired, or asked. The merge is
idempotent — re-running with the same findings is a no-op.
The filter for what counts as a learning candidate is strict: a single-file
bug fix is not an invariant; a rule that would prevent a class of future bugs
is. See references/harness-evolution.md.
| Mode | What it does | Best for |
|---|---|---|
| Skill-only | LLM interaction, domain discovery, prompt surfaces, review packets | solo developers, exploration |
| Project Harness | Adds .anyharness/profile.json and gates |
serious personal projects, small teams |
| Learning Harness | Project Harness + evolution loop (Learning Candidates → profile.json + learningHistory) | teams that want the harness to compound over time |
| Enforcement | Adds local scripts, Git hooks, CI workflow | teams and production repositories |
AnyHarness follows ten rules. See plugins/claude/anyharness/skills/anyharness/references/safety.md for the full rationale.
- Installation does not modify a repo.
- Start with read-only analysis.
- Existing prompt files are not overwritten; drafts are generated.
- Domain examples are not authoritative rules.
- Domain-sensitive conclusions must include evidence and confidence.
- Ask focused questions before finalizing invariants.
- Keep the first user experience simple.
- Do not install hooks without explicit confirmation.
- Generated enforcement scripts must be repo-local and reviewable.
- Do not read secrets or credentials files.
.claude-plugin/marketplace.json # Anthropic plugin marketplace entry
.agents/plugins/marketplace.json # Codex plugin marketplace entry
plugins/
claude/anyharness/
.claude-plugin/plugin.json # Anthropic plugin manifest (skills array format)
skills/anyharness/
SKILL.md # Claude skill (standard version)
SKILL.codex.md # Codex overlay source (lighter, tool-calling focus)
references/ # 14 reference files (single source of truth)
scripts/ # 13 deterministic helper scripts
codex/anyharness/
.codex-plugin/plugin.json # Codex plugin manifest (includes tools array)
skills/anyharness/
SKILL.md # ← generated from SKILL.codex.md by sync script
references/ # ← synced from claude source
scripts/ # ← synced from claude source
standalone/
skills/anyharness/
SKILL.md # ← synced from claude source
references/ # ← synced from claude source
scripts/ # ← synced from claude source
scripts/
validate.mjs # structural validation
sync-distributions.mjs # single-source sync (with stale file cleanup)
test/
run.mjs
fixtures/
profile.valid.json
profile.invalid.json
The plugins/claude/anyharness/skills/anyharness/ directory is the single source of truth.
All changes must be made there; run node scripts/sync-distributions.mjs to propagate to the
other two distributions. The Codex distribution automatically receives SKILL.codex.md as its
SKILL.md (if present), giving it a lighter tool-calling–focused skill file.
npm run checkThis validates: required files, JSON structure, skill frontmatter, Codex tools schema, plugin.json formats, distribution drift, and all behavioral tests. This is not the user installation path.