Skip to content

doublnt/anyharness

Repository files navigation

AnyHarness v3

Install the plugin. Run AnyHarness. Let it learn your project and generate a project-specific engineering harness for AI-assisted development.

AnyHarness v3 is a skill-first adaptive harness. It is not an npx-first CLI and it is not a static checklist library. The normal user path is:

Install plugin → run AnyHarness → scan repo → confirm domain details → generate project harness

The public surface is intentionally small:

Use AnyHarness for this repository.

In Claude Code, the installed plugin may expose the namespaced skill as:

/anyharness:run

In Codex, you can use natural language:

Use AnyHarness to adopt this repository.

Why v3 exists

The original problem is not only that AI-generated code needs a generic review checklist. The harder problem is that every project has different domain risks:

  • a low-latency C++ market-data or trading service
  • an Electron desktop client
  • a Java e-commerce backend
  • an AI agent platform
  • a payment system
  • an internal admin tool

Generic guardrails are useful, but they are not enough. AnyHarness v3 derives a Project Harness Profile from repository evidence and user confirmation.

Core idea: the three-loop harness

Skills reason.
Scripts assist.
Optional hooks enforce.

What makes AnyHarness more than a generator is its feedback loop. A static CLAUDE.md rots; AnyHarness is structured as three connected loops:

┌─────────┐    ┌────────┐    ┌────────┐
│ INIT    │ ─► │ REVIEW │ ─► │ EVOLVE │ ──┐
│ (adopt) │    │ (diff) │    │        │   │
└─────────┘    └────────┘    └────────┘   │
                   ▲                       │
                   └───────────────────────┘
                     profile gets sharper
  • Init bootstraps a project harness profile from repository evidence and user answers.
  • Review uses the profile to find Blockers, Needs Changes, and Learning Candidates.
  • Evolve turns confirmed Learning Candidates into permanent invariants, with provenance and a learningHistory ledger.

Without the evolve loop, AnyHarness is a snapshot. With it, the harness becomes a learning system — every review can make the next one sharper.

Deep architecture analysis

scan-project.mjs only sees filenames and directories — it cannot see the architecture. The deep analysis path is what real harness engineering looks like.

One command to run it

node analyze.mjs --stack auto --path <project-dir>

--stack auto detects the technology stack from project files and takes the right path automatically. Add --save to write a full JSON report to .anyharness/reports/.

Under the hood, it runs:

analyze.mjs  →  extract-architecture.mjs  →  derive-risk-topology.mjs  →  propose-evolution.mjs
  (entry)           (parse source)              (derive risk boundaries)      (write to profile)

Three analysis paths

Path When What runs
A — deterministic Stack is java-spring, rust-tauri, csharp-avalonia, or cpp-sdk Built-in extractor + topology rules → exact file:line findings
B — LLM analysis Any other stack File sampler picks high-signal files → you read them + apply llm-extractor.md
C — user config .anyharness/stack-config.json present Your regex patterns + universal topology rules → deterministic

B→C upgrade: after Path B analysis, run suggest-stack-config.mjs to generate a starter stack-config.json for your language. Edit it, save it, and next time analyze.mjs runs deterministically without LLM file reading.

node suggest-stack-config.mjs --path <dir> --save   # draft to .anyharness/drafts/
node suggest-stack-config.mjs --path <dir> --confirm # activate Path C

AnyHarness ships a complete pipeline (extractor + topology rules + knowledge pack) for 4 technology stacks:

java-spring

Parses .java sources to find controllers, services, repositories, @Transactional methods, Kafka send/listener bindings, external HTTP calls, self-invocations, @Query mutations. Detects:

  • state-mutation-safety: dual-write (DB + Kafka in same @Transactional); this.foo() bypasses Spring proxy; Kafka at-least-once without idempotency
  • missing-modifying: @Query UPDATE/DELETE without @Modifying → silent runtime failure (blocker)
  • resource-lifetime: REQUIRES_NEW under load can exhaust the connection pool
  • external-interaction: HTTP calls without visible timeout / retry / circuit-breaker
  • trust-boundary: HTTP endpoints accepting input without @Valid

rust-tauri

Parses .rs sources to find #[tauri::command] functions, generate_handler![] registrations, unsafe {} blocks, std::fs/std::process calls, tokio::spawn. Detects:

  • trust-boundary (blocker): unsafe block inside a registered Tauri command — renderer JS can trigger arbitrary native memory access
  • trust-boundary: unregistered #[tauri::command] (dead code or plugin route); fs::read_to_string with renderer-supplied path (path traversal)
  • external-interaction: Command::new("sh").arg("-c").arg(&user_input) command injection
  • resource-lifetime: tokio::spawn capturing a raw handle; async command with no cancellation path

csharp-avalonia

Parses .cs sources to find async void, ObservableCollection cross-thread writes, HttpClient creation patterns, Process.Start, [DllImport]/[LibraryImport], unsafe blocks, IDisposable fields. Detects:

  • error-propagation: async void — exceptions propagate to SynchronizationContext and crash the app
  • threading-discipline: ObservableCollection.Add/Clear inside Task.Run — cross-thread mutation crashes UI
  • resource-lifetime: new HttpClient() per call (socket exhaustion); IDisposable field in non-IDisposable class
  • trust-boundary: Process.Start(UseShellExecute=true, fileName=userInput) → OS picks handler; P/Invoke with unchecked marshaling

cpp-sdk

Parses .h/.cpp sources to find public API signatures (raw pointer+length, void* ctx, char* returns), memcpy/sprintf/strcpy, new/delete, std::thread create/detach/join, global mutable state. Detects:

  • trust-boundary (blocker): memcpy(out, data, len) with no len <= out_len check → heap buffer overflow
  • trust-boundary: sprintf without snprintf; public API accepting raw pointer+size
  • api-stability: char* return with ambiguous ownership (who frees?); void* ctx callback with no lifetime contract
  • resource-lifetime: std::thread::detach() → orphan thread, use-after-free on captured pointer; raw new/delete instead of RAII
  • threading-discipline: data race on shared flag (no std::atomic); global state with undefined lock ordering

Every finding includes a file:line citation, severity (blocker/high/medium/low), and a pre-formatted Learning Candidate ready to feed into the evolve loop.

This is what separates AnyHarness from a generic review checklist: structured source extraction + stack-specific knowledge produces findings about this kind of project's real failure modes, not generic style issues.

Extractors are regex-based (PoC quality) and replaceable with tree-sitter / Roslyn / libclang without changing the downstream contract. Adding a new stack requires one extractor module + one topology module + one knowledge pack. See references/probe-architecture.md for the contract.

Any stack (Path B + Path C)

AnyHarness works on any stack — not just the four listed above.

Path B (LLM analysis): analyze.mjs --stack auto detects 15+ stacks. For unsupported ones, it samples the most relevant source files and prints them with guidance from references/llm-extractor.md. You read the files and apply the 7 universal failure modes to produce Risk[] findings.

Path C (deterministic config): drop .anyharness/stack-config.json in your project root — no code required. Define regex patterns for your stack's:

  • trustBoundaryMarkers — route decorators / annotations
  • externalCallPatterns — subprocess, HTTP, and file I/O calls
  • unsafePatterns — dangerous operations
  • asyncPatterns — async function forms
  • errorSwallowPatterns — silent error swallowing

analyze.mjs --stack auto picks up the config automatically. See references/stack-config-schema.md for the full schema and example configs for Python/FastAPI, Go/Gin, and Node/Express.

See references/probe-architecture.md, references/universal-failure-modes.md, and references/stacks/<stack>.md.

AnyHarness uses the LLM where it is strongest:

  • reading project context
  • discovering domain signals
  • asking focused questions
  • synthesizing project-specific rules
  • creating expert review roles
  • designing gates and test oracles
  • generating cross-model review packets

Optional skill scripts handle deterministic support tasks:

  • analyze.mjs — unified architecture analysis pipeline (extract + topology + report)
  • suggest-stack-config.mjs — generate a starter stack-config.json for Path C
  • scan-project.mjs — repository file scan
  • collect-diff.mjs — git diff collection
  • extract-architecture.mjs — per-stack source extraction
  • derive-risk-topology.mjs — risk topology from extraction output
  • sample-for-llm.mjs — ranked file sampling for unsupported stacks (Path B)
  • write-native-prompts.mjs — generate CLAUDE.md / AGENTS.md / Cursor rules
  • write-profile.mjs — write or draft the project harness profile
  • validate-profile.mjs — validate profile JSON
  • generate-review-packet.mjs — cross-model review packet
  • propose-evolution.mjs — merge learning candidates into profile
  • install-local-hooks.mjs — optional Git hooks and CI workflow

No global CLI is required for normal usage.

What AnyHarness generates

By default, AnyHarness writes only native AI prompt surfaces after confirmation:

CLAUDE.md      # Claude Code project instructions
AGENTS.md      # Codex and agent instructions
.cursor/rules/anyharness.mdc  # optional Cursor rule

If you enable Project Harness mode, it also writes:

.anyharness/
  profile.json       # machine-readable project harness profile
                     # (carries learningHistory ledger after each evolution)
  profile.md         # human-readable project harness profile
  drafts/            # safe drafts before --confirm writes
  gates/             # gate artifacts
  packets/           # cross-model review packets
  evidence/          # test/review evidence, if generated

If you enable hard enforcement, it can generate repo-local files:

.anyharness/scripts/check.mjs
.githooks/pre-commit
.githooks/commit-msg
.github/workflows/anyharness.yml

These are generated only after explicit confirmation.

Getting started: Claude Code

Prerequisites

  • Claude Code installed (CLI or desktop app)
  • Node.js 18 or later
  • This repository cloned locally:
git clone https://github.com/doublnt/ai-harness-workflow.git ~/anyharness
# or wherever you prefer

Step 1 — Register as a local plugin marketplace

Open (or create) your Claude Code user settings file:

# macOS / Linux
~/.claude/settings.json

# Windows
%APPDATA%\Claude\settings.json

Add the plugins section (merge with existing content if the file already exists):

{
  "plugins": {
    "marketplaces": [
      {
        "url": "file:///Users/yourname/anyharness/.claude-plugin/marketplace.json"
      }
    ]
  }
}

Replace /Users/yourname/anyharness with the actual path where you cloned this repo. You can verify the path with pwd inside the cloned directory.

Step 2 — Install the plugin

In Claude Code, run:

/plugins install anyharness

Or open the plugin marketplace UI, find anyharness, and click Install.

Step 3 — Verify installation

/anyharness:run

You should see AnyHarness respond and ask what you'd like to do.

Step 4 — Use it on your project

Open (or cd into) the project you want to analyze. Then:

Adopt an existing project:

/anyharness:run adopt this repository safely

Initialize a new project:

/anyharness:run initialize this new project

Review staged changes:

/anyharness:run review the current staged diff

Generate a cross-model review packet:

/anyharness:run create a security review packet for the staged diff

AnyHarness will guide you through the rest interactively — scanning, asking questions, and confirming before writing any files.


Getting started: Codex

Prerequisites

  • Codex with plugin support enabled
  • This repository cloned or accessible locally

Step 1 — Register as a local plugin marketplace

In your Codex configuration, add:

{
  "plugins": {
    "marketplaces": [
      {
        "url": "file:///Users/yourname/anyharness/.agents/plugins/marketplace.json"
      }
    ]
  }
}

Step 2 — Install the plugin

Install the anyharness plugin.

Step 3 — Use natural language

Use AnyHarness to adopt this repository safely.

Then continue the conversation:

Use AnyHarness to generate project-specific expert review roles.
Use AnyHarness to review this diff against the project harness.
Use AnyHarness to create a cross-model review packet.

What happens during first run

When you ask AnyHarness to adopt or initialize a project, it follows this sequence without writing anything until you confirm:

  1. Scan — reads your project files, detects stack and AI workflow files
  2. Hypothesize — proposes domain signals with evidence and confidence levels
  3. Ask — poses 5–12 focused questions about your project's specific rules
  4. Propose — shows what files it would create (CLAUDE.md, profile.json, etc.)
  5. Write — only writes after you confirm each step

Example first-run output:

Scan complete. 847 files scanned.

Stack: Node.js, React, PostgreSQL
AI workflow: CLAUDE.md detected

Domain hypotheses:
- ecommerce/payment: medium confidence
  Evidence: src/payment/, src/orders/, docs/checkout.md

Unknowns:
- Whether payment callbacks can repeat
- How inventory reservation works

Questions:
1. Can payment callbacks be delivered more than once?
2. Is order price frozen at checkout or at payment time?
3. Is inventory reserved immediately or only after payment?

(Reply to answer. I won't write anything until you confirm.)

New project workflow

Ask:

Use AnyHarness to initialize this new project.

AnyHarness will:

  1. perform a read-only scan
  2. detect AI workflow files such as CLAUDE.md, AGENTS.md, .cursor/rules
  3. detect stack signals such as Java, C++, Rust, TypeScript, Electron, React, Spring, etc.
  4. detect domain hypotheses from code, docs, routes, schema, tests, and names
  5. ask focused questions
  6. generate native prompt surfaces
  7. generate a project-specific harness profile
  8. offer optional local enforcement

Existing project workflow

Ask:

/anyharness:run onboard this existing repository

This runs onboard.mjs — a single command that combines project scan + architecture analysis, then writes the profile seeded with real risk findings in one confirmation step.

What it does:

  1. Scan + analyze together — reads directory structure, detects stacks and domain signals, then immediately runs deep architecture extraction on the source code.
  2. Combined presentation — shows domain hypotheses and architecture risk findings (with file:line citations) side by side, rather than as separate steps.
  3. Fewer questions — the architecture analysis already answers some domain questions; AnyHarness only asks what it couldn't infer from the code.
  4. Single write — after you confirm, writes CLAUDE.md, AGENTS.md, and .anyharness/profile.json seeded with both domain invariants and risk-derived invariants in one shot.

Safety rules (unchanged):

  • read-only scan and analysis first — nothing written until you confirm
  • existing CLAUDE.md / AGENTS.md are never overwritten; drafts are generated
  • hooks are not installed unless you explicitly request them

Domain discovery workflow

AnyHarness does not ship authoritative domain packs. Instead, it produces domain hypotheses.

Example output:

Domain hypotheses:
- ecommerce/payment: confidence medium
- inventory consistency: confidence medium

Evidence:
- src/payment/PaymentCallbackController.java
- src/order/OrderService.java
- migrations/create_inventory_reservations.sql
- docs/checkout.md

Unknowns:
- whether payment callbacks can repeat
- whether inventory is reserved or deducted immediately
- where order state transitions are defined

Then it asks focused questions:

1. Can payment callbacks be delivered more than once?
2. Is order final price frozen at order creation?
3. Is inventory reserved at checkout or deducted at payment success?
4. Does fulfillment happen immediately after payment success?

Only after user confirmation does it synthesize project rules.

Expert review roles

AnyHarness creates project-specific roles from the project harness profile.

Examples:

Payment Idempotency Reviewer
Inventory Consistency Reviewer
Electron IPC Boundary Reviewer
Low-Latency C++ Reviewer
Order State Machine Reviewer
Architecture Trade-off Reviewer
Performance and Memory Reviewer
Release Readiness Reviewer

The roles are not just labels. Each one includes:

  • scope
  • required context
  • project-specific invariants
  • blocker criteria
  • required evidence
  • output schema

Review packets

A review packet solves the common problem: another model reviews code without enough context.

Ask:

Use AnyHarness to create a security review packet for the staged diff.

Generated packet:

.anyharness/packets/<id>/
  PROMPT.md
  PROJECT_PROFILE.md
  DIFF.patch
  CHANGED_FILES.txt
  RELEVANT_FILES.md
  GATE_REQUIREMENTS.md
  DOMAIN_INVARIANTS.md
  UNKNOWN.md

You can give that packet to another model and ask it to perform one expert role only.

Harness evolution loop

Every review ends with a Learning Candidates section: structured proposals to update the project harness based on what the review found. This is the mechanism that keeps the profile alive.

Verdict: Blocked

Learning Candidates:
- type: new-invariant
  proposed: Webhook handlers under src/webhooks/ must look up the idempotency key
            in payment_events before any side effect.
  evidence: src/webhooks/PaymentWebhook.java:42, src/webhooks/RefundWebhook.java:31
  rationale: Two handlers already exhibit the missing check.

Apply any of these to the profile?

Candidate types:

  • new-invariant — a rule the project should always follow
  • refined-invariant — sharpen the wording or scope of an existing invariant
  • retired-invariant — remove an invariant that no longer applies
  • new-unknown — a question the reviewer couldn't answer without more context
  • new-gate — a check that should run automatically on every change

When you confirm, AnyHarness merges accepted candidates into .anyharness/profile.json and appends a timestamped entry to learningHistory, including the trigger, what was added, refined, retired, or asked. The merge is idempotent — re-running with the same findings is a no-op.

The filter for what counts as a learning candidate is strict: a single-file bug fix is not an invariant; a rule that would prevent a class of future bugs is. See references/harness-evolution.md.

Modes

Mode What it does Best for
Skill-only LLM interaction, domain discovery, prompt surfaces, review packets solo developers, exploration
Project Harness Adds .anyharness/profile.json and gates serious personal projects, small teams
Learning Harness Project Harness + evolution loop (Learning Candidates → profile.json + learningHistory) teams that want the harness to compound over time
Enforcement Adds local scripts, Git hooks, CI workflow teams and production repositories

Safety model

AnyHarness follows ten rules. See plugins/claude/anyharness/skills/anyharness/references/safety.md for the full rationale.

  1. Installation does not modify a repo.
  2. Start with read-only analysis.
  3. Existing prompt files are not overwritten; drafts are generated.
  4. Domain examples are not authoritative rules.
  5. Domain-sensitive conclusions must include evidence and confidence.
  6. Ask focused questions before finalizing invariants.
  7. Keep the first user experience simple.
  8. Do not install hooks without explicit confirmation.
  9. Generated enforcement scripts must be repo-local and reviewable.
  10. Do not read secrets or credentials files.

Repository layout

.claude-plugin/marketplace.json       # Anthropic plugin marketplace entry
.agents/plugins/marketplace.json      # Codex plugin marketplace entry
plugins/
  claude/anyharness/
    .claude-plugin/plugin.json        # Anthropic plugin manifest (skills array format)
    skills/anyharness/
      SKILL.md                        # Claude skill (standard version)
      SKILL.codex.md                  # Codex overlay source (lighter, tool-calling focus)
      references/                     # 14 reference files (single source of truth)
      scripts/                        # 13 deterministic helper scripts
  codex/anyharness/
    .codex-plugin/plugin.json         # Codex plugin manifest (includes tools array)
    skills/anyharness/
      SKILL.md                        # ← generated from SKILL.codex.md by sync script
      references/                     # ← synced from claude source
      scripts/                        # ← synced from claude source
standalone/
  skills/anyharness/
    SKILL.md                          # ← synced from claude source
    references/                       # ← synced from claude source
    scripts/                          # ← synced from claude source
scripts/
  validate.mjs                        # structural validation
  sync-distributions.mjs              # single-source sync (with stale file cleanup)
test/
  run.mjs
  fixtures/
    profile.valid.json
    profile.invalid.json

The plugins/claude/anyharness/skills/anyharness/ directory is the single source of truth. All changes must be made there; run node scripts/sync-distributions.mjs to propagate to the other two distributions. The Codex distribution automatically receives SKILL.codex.md as its SKILL.md (if present), giving it a lighter tool-calling–focused skill file.

Development validation

npm run check

This validates: required files, JSON structure, skill frontmatter, Codex tools schema, plugin.json formats, distribution drift, and all behavioral tests. This is not the user installation path.

About

面向 Claude/Codex/Spec Kit 的 AI 工程治理层

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors