AnyHarness v3

Install the plugin. Run AnyHarness. Let it learn your project and generate a project-specific engineering harness for AI-assisted development.

AnyHarness v3 is a skill-first adaptive harness. It is not an npx-first CLI and it is not a static checklist library. The normal user path is:

Install plugin → run AnyHarness → scan repo → confirm domain details → generate project harness

The public surface is intentionally small:

Use AnyHarness for this repository.

In Claude Code, the installed plugin may expose the namespaced skill as:

/anyharness:run

In Codex, you can use natural language:

Use AnyHarness to adopt this repository.

Why v3 exists

The original problem is not only that AI-generated code needs a generic review checklist. The harder problem is that every project has different domain risks:

a low-latency C++ market-data or trading service
an Electron desktop client
a Java e-commerce backend
an AI agent platform
a payment system
an internal admin tool

Generic guardrails are useful, but they are not enough. AnyHarness v3 derives a Project Harness Profile from repository evidence and user confirmation.

Core idea: the three-loop harness

Skills reason.
Scripts assist.
Optional hooks enforce.

What makes AnyHarness more than a generator is its feedback loop. A static CLAUDE.md rots; AnyHarness is structured as three connected loops:

┌─────────┐    ┌────────┐    ┌────────┐
│ INIT    │ ─► │ REVIEW │ ─► │ EVOLVE │ ──┐
│ (adopt) │    │ (diff) │    │        │   │
└─────────┘    └────────┘    └────────┘   │
                   ▲                       │
                   └───────────────────────┘
                     profile gets sharper

Init bootstraps a project harness profile from repository evidence and user answers.
Review uses the profile to find Blockers, Needs Changes, and Learning Candidates.
Evolve turns confirmed Learning Candidates into permanent invariants, with provenance and a learningHistory ledger.

Without the evolve loop, AnyHarness is a snapshot. With it, the harness becomes a learning system — every review can make the next one sharper.

Deep architecture analysis

scan-project.mjs only sees filenames and directories — it cannot see the architecture. The deep analysis path is what real harness engineering looks like.

One command to run it

node analyze.mjs --stack auto --path <project-dir>

--stack auto detects the technology stack from project files and takes the right path automatically. Add --save to write a full JSON report to .anyharness/reports/.

Under the hood, it runs:

analyze.mjs  →  extract-architecture.mjs  →  derive-risk-topology.mjs  →  propose-evolution.mjs
  (entry)           (parse source)              (derive risk boundaries)      (write to profile)

Three analysis paths

Path	When	What runs
A — deterministic	Stack is java-spring, rust-tauri, csharp-avalonia, or cpp-sdk	Built-in extractor + topology rules → exact `file:line` findings
B — LLM analysis	Any other stack	File sampler picks high-signal files → you read them + apply `llm-extractor.md`
C — user config	`.anyharness/stack-config.json` present	Your regex patterns + universal topology rules → deterministic

B→C upgrade: after Path B analysis, run suggest-stack-config.mjs to generate a starter stack-config.json for your language. Edit it, save it, and next time analyze.mjs runs deterministically without LLM file reading.

node suggest-stack-config.mjs --path <dir> --save   # draft to .anyharness/drafts/
node suggest-stack-config.mjs --path <dir> --confirm # activate Path C

AnyHarness ships a complete pipeline (extractor + topology rules + knowledge pack) for 4 technology stacks:

java-spring

Parses .java sources to find controllers, services, repositories, @Transactional methods, Kafka send/listener bindings, external HTTP calls, self-invocations, @Query mutations. Detects:

state-mutation-safety: dual-write (DB + Kafka in same @Transactional); this.foo() bypasses Spring proxy; Kafka at-least-once without idempotency
missing-modifying: @Query UPDATE/DELETE without @Modifying → silent runtime failure (blocker)
resource-lifetime: REQUIRES_NEW under load can exhaust the connection pool
external-interaction: HTTP calls without visible timeout / retry / circuit-breaker
trust-boundary: HTTP endpoints accepting input without @Valid

rust-tauri

Parses .rs sources to find #[tauri::command] functions, generate_handler![] registrations, unsafe {} blocks, std::fs/std::process calls, tokio::spawn. Detects:

trust-boundary (blocker): unsafe block inside a registered Tauri command — renderer JS can trigger arbitrary native memory access
trust-boundary: unregistered #[tauri::command] (dead code or plugin route); fs::read_to_string with renderer-supplied path (path traversal)
external-interaction: Command::new("sh").arg("-c").arg(&user_input) command injection
resource-lifetime: tokio::spawn capturing a raw handle; async command with no cancellation path

csharp-avalonia

Parses .cs sources to find async void, ObservableCollection cross-thread writes, HttpClient creation patterns, Process.Start, [DllImport]/[LibraryImport], unsafe blocks, IDisposable fields. Detects:

error-propagation: async void — exceptions propagate to SynchronizationContext and crash the app
threading-discipline: ObservableCollection.Add/Clear inside Task.Run — cross-thread mutation crashes UI
resource-lifetime: new HttpClient() per call (socket exhaustion); IDisposable field in non-IDisposable class
trust-boundary: Process.Start(UseShellExecute=true, fileName=userInput) → OS picks handler; P/Invoke with unchecked marshaling

cpp-sdk

Parses .h/.cpp sources to find public API signatures (raw pointer+length, void* ctx, char* returns), memcpy/sprintf/strcpy, new/delete, std::thread create/detach/join, global mutable state. Detects:

trust-boundary (blocker): memcpy(out, data, len) with no len <= out_len check → heap buffer overflow
trust-boundary: sprintf without snprintf; public API accepting raw pointer+size
api-stability: char* return with ambiguous ownership (who frees?); void* ctx callback with no lifetime contract
resource-lifetime: std::thread::detach() → orphan thread, use-after-free on captured pointer; raw new/delete instead of RAII
threading-discipline: data race on shared flag (no std::atomic); global state with undefined lock ordering

Every finding includes a file:line citation, severity (blocker/high/medium/low), and a pre-formatted Learning Candidate ready to feed into the evolve loop.

This is what separates AnyHarness from a generic review checklist: structured source extraction + stack-specific knowledge produces findings about this kind of project's real failure modes, not generic style issues.

Extractors are regex-based (PoC quality) and replaceable with tree-sitter / Roslyn / libclang without changing the downstream contract. Adding a new stack requires one extractor module + one topology module + one knowledge pack. See references/probe-architecture.md for the contract.

Any stack (Path B + Path C)

AnyHarness works on any stack — not just the four listed above.

Path B (LLM analysis): analyze.mjs --stack auto detects 15+ stacks. For unsupported ones, it samples the most relevant source files and prints them with guidance from references/llm-extractor.md. You read the files and apply the 7 universal failure modes to produce Risk[] findings.

Path C (deterministic config): drop .anyharness/stack-config.json in your project root — no code required. Define regex patterns for your stack's:

trustBoundaryMarkers — route decorators / annotations
externalCallPatterns — subprocess, HTTP, and file I/O calls
unsafePatterns — dangerous operations
asyncPatterns — async function forms
errorSwallowPatterns — silent error swallowing

analyze.mjs --stack auto picks up the config automatically. See references/stack-config-schema.md for the full schema and example configs for Python/FastAPI, Go/Gin, and Node/Express.

See references/probe-architecture.md, references/universal-failure-modes.md, and references/stacks/<stack>.md.

AnyHarness uses the LLM where it is strongest:

reading project context
discovering domain signals
asking focused questions
synthesizing project-specific rules
creating expert review roles
designing gates and test oracles
generating cross-model review packets

Optional skill scripts handle deterministic support tasks:

analyze.mjs — unified architecture analysis pipeline (extract + topology + report)
suggest-stack-config.mjs — generate a starter stack-config.json for Path C
scan-project.mjs — repository file scan
collect-diff.mjs — git diff collection
extract-architecture.mjs — per-stack source extraction
derive-risk-topology.mjs — risk topology from extraction output
sample-for-llm.mjs — ranked file sampling for unsupported stacks (Path B)
write-native-prompts.mjs — generate CLAUDE.md / AGENTS.md / Cursor rules
write-profile.mjs — write or draft the project harness profile
validate-profile.mjs — validate profile JSON
generate-review-packet.mjs — cross-model review packet
propose-evolution.mjs — merge learning candidates into profile
install-local-hooks.mjs — optional Git hooks and CI workflow

No global CLI is required for normal usage.

What AnyHarness generates

By default, AnyHarness writes only native AI prompt surfaces after confirmation:

CLAUDE.md      # Claude Code project instructions
AGENTS.md      # Codex and agent instructions
.cursor/rules/anyharness.mdc  # optional Cursor rule

If you enable Project Harness mode, it also writes:

.anyharness/
  profile.json       # machine-readable project harness profile
                     # (carries learningHistory ledger after each evolution)
  profile.md         # human-readable project harness profile
  drafts/            # safe drafts before --confirm writes
  gates/             # gate artifacts
  packets/           # cross-model review packets
  evidence/          # test/review evidence, if generated

If you enable hard enforcement, it can generate repo-local files:

.anyharness/scripts/check.mjs
.githooks/pre-commit
.githooks/commit-msg
.github/workflows/anyharness.yml

These are generated only after explicit confirmation.

Getting started: Claude Code

Prerequisites

Claude Code installed (CLI or desktop app)
Node.js 18 or later
This repository cloned locally:

git clone https://github.com/doublnt/ai-harness-workflow.git ~/anyharness
# or wherever you prefer

Step 1 — Register as a local plugin marketplace

Open (or create) your Claude Code user settings file:

# macOS / Linux
~/.claude/settings.json

# Windows
%APPDATA%\Claude\settings.json

Add the plugins section (merge with existing content if the file already exists):

{
  "plugins": {
    "marketplaces": [
      {
        "url": "file:///Users/yourname/anyharness/.claude-plugin/marketplace.json"
      }
    ]
  }
}

Replace /Users/yourname/anyharness with the actual path where you cloned this repo. You can verify the path with pwd inside the cloned directory.

Step 2 — Install the plugin

In Claude Code, run:

/plugins install anyharness

Or open the plugin marketplace UI, find anyharness, and click Install.

Step 3 — Verify installation

/anyharness:run

You should see AnyHarness respond and ask what you'd like to do.

Step 4 — Use it on your project

Open (or cd into) the project you want to analyze. Then:

Adopt an existing project:

/anyharness:run adopt this repository safely

Initialize a new project:

/anyharness:run initialize this new project

Review staged changes:

/anyharness:run review the current staged diff

Generate a cross-model review packet:

/anyharness:run create a security review packet for the staged diff

AnyHarness will guide you through the rest interactively — scanning, asking questions, and confirming before writing any files.

Getting started: Codex

Prerequisites

Codex with plugin support enabled
This repository cloned or accessible locally

Step 1 — Register as a local plugin marketplace

In your Codex configuration, add:

{
  "plugins": {
    "marketplaces": [
      {
        "url": "file:///Users/yourname/anyharness/.agents/plugins/marketplace.json"
      }
    ]
  }
}

Step 2 — Install the plugin

Install the anyharness plugin.

Step 3 — Use natural language

Use AnyHarness to adopt this repository safely.

Then continue the conversation:

Use AnyHarness to generate project-specific expert review roles.
Use AnyHarness to review this diff against the project harness.
Use AnyHarness to create a cross-model review packet.

What happens during first run

When you ask AnyHarness to adopt or initialize a project, it follows this sequence without writing anything until you confirm:

Scan — reads your project files, detects stack and AI workflow files
Hypothesize — proposes domain signals with evidence and confidence levels
Ask — poses 5–12 focused questions about your project's specific rules
Propose — shows what files it would create (CLAUDE.md, profile.json, etc.)
Write — only writes after you confirm each step

Example first-run output:

Scan complete. 847 files scanned.

Stack: Node.js, React, PostgreSQL
AI workflow: CLAUDE.md detected

Domain hypotheses:
- ecommerce/payment: medium confidence
  Evidence: src/payment/, src/orders/, docs/checkout.md

Unknowns:
- Whether payment callbacks can repeat
- How inventory reservation works

Questions:
1. Can payment callbacks be delivered more than once?
2. Is order price frozen at checkout or at payment time?
3. Is inventory reserved immediately or only after payment?

(Reply to answer. I won't write anything until you confirm.)

New project workflow

Ask:

Use AnyHarness to initialize this new project.

AnyHarness will:

perform a read-only scan
detect AI workflow files such as CLAUDE.md, AGENTS.md, .cursor/rules
detect stack signals such as Java, C++, Rust, TypeScript, Electron, React, Spring, etc.
detect domain hypotheses from code, docs, routes, schema, tests, and names
ask focused questions
generate native prompt surfaces
generate a project-specific harness profile
offer optional local enforcement

Existing project workflow

Ask:

/anyharness:run onboard this existing repository

This runs onboard.mjs — a single command that combines project scan + architecture analysis, then writes the profile seeded with real risk findings in one confirmation step.

What it does:

Scan + analyze together — reads directory structure, detects stacks and domain signals, then immediately runs deep architecture extraction on the source code.
Combined presentation — shows domain hypotheses and architecture risk findings (with file:line citations) side by side, rather than as separate steps.
Fewer questions — the architecture analysis already answers some domain questions; AnyHarness only asks what it couldn't infer from the code.
Single write — after you confirm, writes CLAUDE.md, AGENTS.md, and .anyharness/profile.json seeded with both domain invariants and risk-derived invariants in one shot.

Safety rules (unchanged):

read-only scan and analysis first — nothing written until you confirm
existing CLAUDE.md / AGENTS.md are never overwritten; drafts are generated
hooks are not installed unless you explicitly request them

Domain discovery workflow

AnyHarness does not ship authoritative domain packs. Instead, it produces domain hypotheses.

Example output:

Domain hypotheses:
- ecommerce/payment: confidence medium
- inventory consistency: confidence medium

Evidence:
- src/payment/PaymentCallbackController.java
- src/order/OrderService.java
- migrations/create_inventory_reservations.sql
- docs/checkout.md

Unknowns:
- whether payment callbacks can repeat
- whether inventory is reserved or deducted immediately
- where order state transitions are defined

Then it asks focused questions:

1. Can payment callbacks be delivered more than once?
2. Is order final price frozen at order creation?
3. Is inventory reserved at checkout or deducted at payment success?
4. Does fulfillment happen immediately after payment success?

Only after user confirmation does it synthesize project rules.

Expert review roles

AnyHarness creates project-specific roles from the project harness profile.

Examples:

Payment Idempotency Reviewer
Inventory Consistency Reviewer
Electron IPC Boundary Reviewer
Low-Latency C++ Reviewer
Order State Machine Reviewer
Architecture Trade-off Reviewer
Performance and Memory Reviewer
Release Readiness Reviewer

The roles are not just labels. Each one includes:

scope
required context
project-specific invariants
blocker criteria
required evidence
output schema

Review packets

A review packet solves the common problem: another model reviews code without enough context.

Ask:

Use AnyHarness to create a security review packet for the staged diff.

Generated packet:

.anyharness/packets/<id>/
  PROMPT.md
  PROJECT_PROFILE.md
  DIFF.patch
  CHANGED_FILES.txt
  RELEVANT_FILES.md
  GATE_REQUIREMENTS.md
  DOMAIN_INVARIANTS.md
  UNKNOWN.md

You can give that packet to another model and ask it to perform one expert role only.

Harness evolution loop

Every review ends with a Learning Candidates section: structured proposals to update the project harness based on what the review found. This is the mechanism that keeps the profile alive.

Verdict: Blocked

Learning Candidates:
- type: new-invariant
  proposed: Webhook handlers under src/webhooks/ must look up the idempotency key
            in payment_events before any side effect.
  evidence: src/webhooks/PaymentWebhook.java:42, src/webhooks/RefundWebhook.java:31
  rationale: Two handlers already exhibit the missing check.

Apply any of these to the profile?

Candidate types:

new-invariant — a rule the project should always follow
refined-invariant — sharpen the wording or scope of an existing invariant
retired-invariant — remove an invariant that no longer applies
new-unknown — a question the reviewer couldn't answer without more context
new-gate — a check that should run automatically on every change

When you confirm, AnyHarness merges accepted candidates into .anyharness/profile.json and appends a timestamped entry to learningHistory, including the trigger, what was added, refined, retired, or asked. The merge is idempotent — re-running with the same findings is a no-op.

The filter for what counts as a learning candidate is strict: a single-file bug fix is not an invariant; a rule that would prevent a class of future bugs is. See references/harness-evolution.md.

Modes

Mode	What it does	Best for
Skill-only	LLM interaction, domain discovery, prompt surfaces, review packets	solo developers, exploration
Project Harness	Adds `.anyharness/profile.json` and gates	serious personal projects, small teams
Learning Harness	Project Harness + evolution loop (Learning Candidates → profile.json + learningHistory)	teams that want the harness to compound over time
Enforcement	Adds local scripts, Git hooks, CI workflow	teams and production repositories

Safety model

AnyHarness follows ten rules. See plugins/claude/anyharness/skills/anyharness/references/safety.md for the full rationale.

Installation does not modify a repo.
Start with read-only analysis.
Existing prompt files are not overwritten; drafts are generated.
Domain examples are not authoritative rules.
Domain-sensitive conclusions must include evidence and confidence.
Ask focused questions before finalizing invariants.
Keep the first user experience simple.
Do not install hooks without explicit confirmation.
Generated enforcement scripts must be repo-local and reviewable.
Do not read secrets or credentials files.

Repository layout

.claude-plugin/marketplace.json       # Anthropic plugin marketplace entry
.agents/plugins/marketplace.json      # Codex plugin marketplace entry
plugins/
  claude/anyharness/
    .claude-plugin/plugin.json        # Anthropic plugin manifest (skills array format)
    skills/anyharness/
      SKILL.md                        # Claude skill (standard version)
      SKILL.codex.md                  # Codex overlay source (lighter, tool-calling focus)
      references/                     # 14 reference files (single source of truth)
      scripts/                        # 13 deterministic helper scripts
  codex/anyharness/
    .codex-plugin/plugin.json         # Codex plugin manifest (includes tools array)
    skills/anyharness/
      SKILL.md                        # ← generated from SKILL.codex.md by sync script
      references/                     # ← synced from claude source
      scripts/                        # ← synced from claude source
standalone/
  skills/anyharness/
    SKILL.md                          # ← synced from claude source
    references/                       # ← synced from claude source
    scripts/                          # ← synced from claude source
scripts/
  validate.mjs                        # structural validation
  sync-distributions.mjs              # single-source sync (with stale file cleanup)
test/
  run.mjs
  fixtures/
    profile.valid.json
    profile.invalid.json

The plugins/claude/anyharness/skills/anyharness/ directory is the single source of truth. All changes must be made there; run node scripts/sync-distributions.mjs to propagate to the other two distributions. The Codex distribution automatically receives SKILL.codex.md as its SKILL.md (if present), giving it a lighter tool-calling–focused skill file.

Development validation

npm run check

This validates: required files, JSON structure, skill frontmatter, Codex tools schema, plugin.json formats, distribution drift, and all behavioral tests. This is not the user installation path.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.agents/plugins		.agents/plugins
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
docs		docs
plugins		plugins
scripts		scripts
standalone/skills/anyharness		standalone/skills/anyharness
test		test
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
EXAMPLES.md		EXAMPLES.md
EXAMPLES.zh-CN.md		EXAMPLES.zh-CN.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

AnyHarness v3

Why v3 exists

Core idea: the three-loop harness

Deep architecture analysis

One command to run it

Three analysis paths

java-spring

rust-tauri

csharp-avalonia

cpp-sdk

Any stack (Path B + Path C)

What AnyHarness generates

Getting started: Claude Code

Prerequisites

Step 1 — Register as a local plugin marketplace

Step 2 — Install the plugin

Step 3 — Verify installation

Step 4 — Use it on your project

Getting started: Codex

Prerequisites

Step 1 — Register as a local plugin marketplace

Step 2 — Install the plugin

Step 3 — Use natural language

What happens during first run

New project workflow

Existing project workflow

Domain discovery workflow

Expert review roles

Review packets

Harness evolution loop

Modes

Safety model

Repository layout

Development validation

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages