archmesh

Multi-agent architecture review with hard cost contracts, tool guardrails, and deterministic stop conditions.

Most multi-agent orchestrators treat cost and tool safety as your problem. A critic backed by Claude CLI can silently read your entire repository on every round unless something explicitly stops it — and none of them will tell you the bill before it arrives. archmesh is the trust layer that prevents this.

Two AI architects propose competing solutions. Critics and an economist challenge them. Architects rebut each other. A judge ensemble scores everything on a weighted rubric and picks a winner — or synthesizes a hybrid. Every role runs under a hard USD cap with a tool policy that controls what it can touch.

Why archmesh?

Harness-agnostic debate engines like argue are elegant in theory: one delegate interface, any backend. In practice they hand off all cost and safety responsibility to you. archmesh is opinionated in the places that matter:

What	How
Hard per-role budget caps	Each role has a ceiling (architects $0.40, critics $0.12, judges $0.15). Overages skip the agent gracefully — they don't abort the run.
Tool policy profiles	`agentic` architects get `Read/Glob/Grep`. `analysis` critics get no file tools. `strict` judges get nothing. Destructive tools (`Write`, `Edit`, `Bash`, `WebSearch`) are blocked in every profile.
Dry-run mode	Compute worst-case spend and exit before any agent is called.
Full cost audit	`usage.json` with per-role token counts and USD after every run.
Graceful degradation	One failing or over-budget agent never kills the whole mesh.

Why not Argue / LangGraph / AutoGen?

	Argue	LangGraph / AutoGen	archmesh
Hard per-role USD caps	✗ delegate responsibility	Partial (callbacks)	✓ enforced before each call
Tool allowlist / blocklist	✗	✗	✓ per-role profiles
Dry-run cost estimate	✗	✗	✓ static cap sum, exits before any API call
Repo-scan prevention	✗	✗	✓ non-agentic roles blocked from file tools
Graceful degradation	✗	Partial	✓ `Promise.allSettled` across all roles
Debate topology control	✓	✗	✓ `ring` / `tournament` / `full`
Structured schema validation	✗	Partial	✓ Zod-validated every round
Resume interrupted runs	✗	✗	✓ `state.json` checkpoint

Killer demo

# Preview costs before spending a single token
archmesh run brief.md --repo ./myproject --dry-run \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514

# Output:
# === Agent Configuration ===
#   repo: /Users/you/myproject
#   architect_a:  claude → claude-sonnet-4-20250514 (cap $0.40)
#   architect_b:  claude → claude-sonnet-4-20250514 (cap $0.40)
#   critic:       claude → claude-sonnet-4-20250514 (cap $0.12)
#   economist:    claude → claude-sonnet-4-20250514 (cap $0.12)
#   judge_1:      claude → claude-sonnet-4-20250514 (cap $0.15)
#   rebuttals (×2): max $0.16
#   Total worst-case: $1.35 (capped by --budget 5)
#   Effective cap: $1.35
# Dry run — exiting without calling any agents.

# Run for real with a $2 global cap, critics restricted to prompt-only (no repo reads)
archmesh run brief.md --repo ./myproject --budget 2 \
  --policy-critic analysis \
  --policy-economist analysis \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514

How It Works

Round 1 — Proposals & Critiques (parallel)
  architect_a ──► proposal A
  architect_b ──► proposal B
  critic      ──► critique of both proposals
  economist   ──► cost-focused critique of both proposals

Round 2 — Rebuttals (topology-driven)
  Each architect rebuts the other's proposal
  Critics may also rebut architects depending on topology

Round 3 — Judgment (ensemble)
  N judges independently score all artifacts on a weighted rubric
  Weighted averages produce the final decision

Output is a decision.md with the winning architecture, rubric scores, dissent notes, next actions, and a usage.json with per-role costs.

Control plane

brief.md + target repo
  ↓
role prompt builders      ← skills injected (--skills-dir)
                          ← connectors appended (--connectors)
  ↓
adapter calls             ← tool policy enforced (allowedTools / disallowedTools)
                          ← per-role USD cap checked before each call
  ↓
Zod schema validation     ← malformed output skipped, run continues
  ↓
ensemble scoring          ← weighted rubric applied; judges never see totals
  ↓
persisted artifacts       → decision.md · usage.json · state.json

Prerequisites

archmesh orchestrates locally-installed AI coding tools via their TypeScript SDKs. Install the ones you want to use:

Backend	Install	Auth
Claude	`npm i -g @anthropic-ai/claude-code`	Claude Pro/Max subscription
GitHub Copilot	Bundled with `@github/copilot-sdk`	GitHub Copilot subscription
OpenCode	`npm i -g opencode`	Configure providers in `~/.config/opencode/config.json`
Codex	Uses OpenCode with OpenAI models	OpenAI API key via OpenCode config

You need at least one backend installed. Claude is the default for most roles.

Install

npm install -g archmesh

Or use without installing:

npx archmesh <command>

Quick Start

1. Run the setup wizard

The fastest way to get started — detects your installed backends and generates the full CLI command:

archmesh init

The wizard will:

Auto-detect which backends are installed and authenticated
Let you pick Quick mode (one model for everything) or Advanced mode (per-role)
Show available models from each backend's SDK
Walk you through topology, judge count, and budget
Generate the full archmesh run command
Optionally save a config file and/or run immediately

2. Write an architecture brief

Create a markdown file describing what you want reviewed. See src/__fixtures__/briefs/ for examples.

A good brief includes:

Context — what the system does, current stack
What to review — the specific subsystem or decision
Constraints — budget, team size, cloud provider, latency targets
Questions — specific tradeoffs you want evaluated

3. List available models

archmesh models          # all backends
archmesh models claude   # just Claude
archmesh models opencode # just OpenCode

4. Run a review

Point --repo at the project you want the agents to analyze. Model flags are required unless you use --config.

# All Claude
archmesh run my-brief.md \
  --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514

# Mix backends for adversarial diversity
archmesh run my-brief.md \
  --repo /path/to/project \
  --architect-a claude   --model-architect-a claude-sonnet-4-20250514 \
  --architect-b copilot  --model-architect-b gpt-4.1 \
  --critic opencode      --model-critic anthropic/claude-sonnet-4-20250514 \
  --economist claude     --model-economist claude-sonnet-4-20250514 \
  --judge claude         --model-judge claude-sonnet-4-20250514 \
  --topology ring \
  --judges 3 \
  --budget 10

# Use a config file from `archmesh init`
archmesh run my-brief.md \
  --repo /path/to/project \
  --config archmesh.myproject.config.json

5. Review the output

Results are saved to runs/<timestamp>/:

runs/2026-04-15T10-30-00-000Z/
├── round1/
│   ├── architect_a.json    # Proposal A
│   ├── architect_b.json    # Proposal B
│   ├── critic.json         # Critic's critique
│   └── economist.json      # Economist's critique
├── round2/
│   └── rebuttals.json      # All rebuttals
├── round3/
│   ├── scores.json         # Raw per-judge scores
│   ├── judgment.json       # Ensemble judgment
│   └── decision.md         # Human-readable final decision
├── usage.json              # Per-role cost breakdown
└── state.json              # Run state (for resume)

CLI Reference

`archmesh init`

Interactive setup wizard. Detects installed backends, queries their model lists, and outputs a ready-to-run command or config file.

archmesh init

Modes:

Quick — one backend and model for all roles
Advanced — different backend and model per role

`archmesh run <briefPath>`

Run a full three-round architecture review.

Flag	Default	Description
`--repo <path>`	required	Path to the target repository to analyze
`--config <path>`	—	Config file from `archmesh init` — provides all params; no other flags needed when used
`--architect-a <backend>`	`claude`	Backend for architect A
`--architect-b <backend>`	`claude`	Backend for architect B
`--critic <backend>`	`opencode`	Backend for the critic
`--economist <backend>`	`claude`	Backend for the economist
`--judge <backend>`	`claude`	Backend for the judge(s)
`--judges <count>`	`3`	Number of judge instances (ensemble)
`--model-architect-a <model>`	—	Model for architect A (required if no `--config`)
`--model-architect-b <model>`	—	Model for architect B (required if no `--config`)
`--model-critic <model>`	—	Model for the critic (required if no `--config`)
`--model-economist <model>`	—	Model for the economist (required if no `--config`)
`--model-judge <model>`	—	Model for the judge(s) (required if no `--config`)
`--topology <type>`	`tournament`	Rebuttal topology: `ring`, `tournament`, or `full`
`--budget <usd>`	`3`	Maximum total spend in USD
`--interactive`	`false`	Pause after Round 1 for human review before proceeding
`--dry-run`	`false`	Show worst-case cost estimate and exit without calling agents
`--policy-architect-a <profile>`	`agentic`	Tool policy for architect A: `agentic`, `analysis`, or `strict`
`--policy-architect-b <profile>`	`agentic`	Tool policy for architect B
`--policy-critic <profile>`	`analysis`	Tool policy for the critic
`--policy-economist <profile>`	`analysis`	Tool policy for the economist
`--policy-judge <profile>`	`strict`	Tool policy for judges
`--skills-dir <path>`	—	Directory of role skill files (see Skills Packs)
`--connectors <path>`	—	JSON file defining MCP server connections (see MCP Connectors)

`archmesh resume <runDir>`

Resume a previously interrupted run from its saved state. Reads state.json from the run directory and continues from where it left off.

Flag	Default	Description
`--repo <path>`	required	Path to the target repository
`--budget <usd>`	`10`	Remaining budget cap
`--topology <type>`	`tournament`	Rebuttal topology (if resuming before Round 2)
`--architect-a <backend>`	`claude`	Backend for architect A
`--architect-b <backend>`	`claude`	Backend for architect B
`--critic <backend>`	`opencode`	Backend for critic
`--economist <backend>`	`claude`	Backend for economist
`--judge <backend>`	`claude`	Backend for judge
`--judges <count>`	`3`	Number of judges
`--model-architect-a <model>`	required	Model for architect A
`--model-architect-b <model>`	required	Model for architect B
`--model-critic <model>`	required	Model for critic
`--model-economist <model>`	required	Model for economist
`--model-judge <model>`	required	Model for judge

`archmesh serve`

Start an A2A-compliant HTTP server that accepts architecture review tasks from other agents or automation pipelines.

archmesh serve \
  --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514 \
  --port 8787 \
  --auth bearer \
  --token mysecret

Flag	Default	Description
`--repo <path>`	required	Default repository path for mesh runs
`--model-architect-a <model>`	required	Model for architect A
`--model-architect-b <model>`	required	Model for architect B
`--model-critic <model>`	required	Model for critic
`--model-economist <model>`	required	Model for economist
`--model-judge <model>`	required	Model for judge
`--port <port>`	`8787`	HTTP port to listen on
`--base-url <url>`	`http://localhost:8787`	Public base URL (used in Agent Card)
`--auth <scheme>`	`none`	Auth scheme: `none`, `bearer`, `api_key`
`--token <secret>`	—	Bearer token or API key (required when `--auth != none`)
`--topology <type>`	`tournament`	Rebuttal topology
`--budget <usd>`	`3`	Per-task budget cap
`--judges <count>`	`3`	Number of judge instances
`--config <path>`	—	Config file for role backends/models

Exposes two endpoints:

GET /.well-known/agent.json — A2A Agent Card for discovery
POST /a2a — JSON-RPC 2.0 dispatcher

Supported JSON-RPC methods: tasks/send, tasks/sendSubscribe (SSE), tasks/get, tasks/cancel.

`archmesh a2a-card`

Generate an A2A Agent Card JSON for this archmesh instance without starting a server.

archmesh a2a-card --base-url https://archmesh.example.com --auth bearer

# Write to file
archmesh a2a-card --base-url https://archmesh.example.com --out agent.json

Flag	Default	Description
`--base-url <url>`	`http://localhost:8787`	Public base URL for this instance
`--auth <scheme>`	`none`	Auth scheme: `none`, `bearer`, `oauth2`, `api_key`
`--out <path>`	—	Write JSON to file instead of stdout

`archmesh models [backend]`

List available models for one or all backends. Queries each SDK at runtime.

archmesh models          # all backends
archmesh models claude
archmesh models opencode
archmesh models copilot
archmesh models codex

`archmesh config`

Show environment configuration and example model names per backend.

archmesh config

Backends

Each agent role can use a different backend. Mix them for adversarial diversity — different models with different training data produce more independent proposals.

Backend	Description
`claude`	Anthropic Claude via the Claude Code SDK. Best overall reasoning. Uses your local `claude` CLI auth.
`copilot`	GitHub Copilot via `@github/copilot-sdk`. Good for code-aware analysis. Uses your GitHub auth.
`opencode`	OpenCode SDK. Supports multiple providers (Anthropic, OpenAI, Google, etc.) via its own config.
`codex`	OpenCode SDK configured with OpenAI's Codex models.

Every --model-* flag is required — there are no defaults. Use archmesh models to discover what's available on your machine.

Topologies

Controls who rebuts whom in Round 2:

Topology	Pairs	Description
`ring`	2	Each architect rebuts the other only
`tournament`	6	Critics also rebut architects (default)
`full`	N×(N-1)	All participants rebut all others

Rubric

Judges score both architect proposals on 8 weighted criteria (0–10 per criterion). The orchestrator computes weighted totals — judges never see or compute totals themselves.

Criterion	Weight	What it measures
Constraint fit	22%	How well the proposal addresses stated constraints
Operational simplicity	18%	Ease of running in production
Delivery speed	14%	Time to working system
Scalability	14%	Ability to handle growth
Cost predictability	12%	Budget certainty and TCO
Security & compliance	10%	Security posture and compliance gaps
Vendor lock-in risk	5%	Dependency on proprietary services
Evolvability (24 months)	5%	Ability to adapt to changing requirements

Tool Policy Profiles

Each agent role has a default tool policy that controls what it can do during its Claude SDK session. Policies are enforced across all adapters.

Profile	Default roles	Permitted tools
`agentic`	`architect_a`, `architect_b`	`Read`, `Glob`, `Grep` (read-only repo exploration)
`analysis`	`critic`, `economist`, `rebuttal`	No file tools — works from prompt context only
`strict`	`judge`	No tools at all — pure structured completion

All profiles hard-block destructive tools (Write, Edit, Bash, WebSearch, etc.) regardless of setting.

Override defaults per role with --policy-* flags:

archmesh run brief.md --repo /project \
  --policy-architect-a analysis \   # restrict architects to prompt-only
  --policy-judge agentic \          # allow judges to read files
  ...

Adapter safety model

Enforcement strength varies by backend. Pair high-risk roles with Claude for the strongest guarantees.

Adapter	Enforcement mechanism	Strength
`claude`	`allowedTools` / `disallowedTools` passed to the Claude Agent SDK. Enforced at API level — the model cannot invoke a blocked tool regardless of prompt content.	Hard
`copilot`	`allowExploration` flag controls whether tool-calling context is injected into the system prompt. No SDK-level enforcement.	Soft
`opencode`	`tools` override map passed to the OpenCode SDK. Enforcement depends on the underlying provider and model.	Best-effort
`codex`	Same as OpenCode.	Best-effort

MCP connectors are only active on the Claude adapter. Other adapters receive the registry but MCP injection is a no-op.

Policy in action — blocked repo scan

archmesh run brief.md --repo ./myproject \
  --policy-critic analysis \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 ...

# The critic is backed by Claude with disallowedTools: ["Task","Read","Glob","Grep"].
# Even if the prompt asks to read files, the SDK rejects the tool call.
# If the critic also exceeds its $0.12 cap it is skipped via Promise.allSettled:
#
#   ⚠  critic skipped (budget exceeded: $0.12)
#   ✓  economist completed  $0.09
#
# The run continues to Round 2 and Round 3 with available critiques.

Extensions

The following features layer on top of the core review mesh. All are optional — a basic archmesh run needs none of them.

Skills Packs

Inject domain-specific expertise into any agent role by creating markdown files in a skills directory. Skills are appended to the role's base prompt under a ROLE SKILLS: section.

# Create a skills directory
mkdir my-skills

# Write domain expertise for each role
cat > my-skills/architect_a.md << 'EOF'
- Prefer Kubernetes-native solutions (Helm, Kustomize, Argo CD)
- All services must expose /health and /metrics endpoints
- mTLS required between all internal services
EOF

cat > my-skills/economist.md << 'EOF'
- Model costs over 24 months including engineering time at $150k/yr fully loaded
- Flag any solution requiring a dedicated platform engineer
- Prefer committed-use pricing where multi-year commitment risk is acceptable
EOF

# Pass the directory at runtime
archmesh run brief.md --repo /project --skills-dir ./my-skills ...

Supported skill files (one per role, all optional):

File	Role
`architect_a.md`	Architect A proposals
`architect_b.md`	Architect B proposals
`critic.md`	Critic critiques
`economist.md`	Economist critiques
`rebuttal.md`	Rebuttal round (all agents)
`judge.md`	Judge scoring

Missing files fall back silently to the base prompt. Files exceeding 8,000 characters are truncated with a warning.

MCP Connectors

Give agents access to external MCP (Model Context Protocol) servers — documentation search, internal APIs, issue trackers, etc. — via a JSON registry file.

Registry format

[
  {
    "name": "github",
    "transport": "stdio",
    "command": "mcp-github",
    "args": ["--token", "ghp_xxx"],
    "roles": ["architect_a", "architect_b"]
  },
  {
    "name": "docs",
    "transport": "http",
    "url": "http://localhost:3001",
    "headers": { "Authorization": "Bearer token" }
  },
  {
    "name": "events",
    "transport": "sse",
    "url": "http://localhost:4000/sse",
    "roles": ["critic"]
  }
]

Transport types:

Transport	Required fields	Optional fields
`stdio`	`command`	`args`, `env`
`http`	`url`	`headers`
`sse`	`url`	`headers`

roles field: When omitted, the connector is available to all roles. When specified, only agents whose role name appears in the list receive the connector.

Using connectors

archmesh run brief.md \
  --repo /project \
  --connectors ./connectors.json \
  ...

Connectors are currently wired into the Claude adapter only (the Claude Agent SDK supports mcpServers natively). Other adapters receive the full registry but MCP injection is a no-op — they operate on prompt context only.

A2A Server

archmesh exposes itself as an Agent2Agent (A2A) compatible service, allowing other agents or orchestration systems to invoke architecture reviews programmatically.

Starting the server

archmesh serve \
  --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514 \
  --port 8787 \
  --base-url https://archmesh.example.com \
  --auth bearer \
  --token $ARCHMESH_TOKEN

Submitting a task

# Discover the agent card
curl https://archmesh.example.com/.well-known/agent.json

# Submit a task (fire-and-forget)
curl -X POST https://archmesh.example.com/a2a \
  -H "Authorization: Bearer $ARCHMESH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tasks/send",
    "params": {
      "id": "task-001",
      "message": {
        "parts": [{ "type": "text", "text": "Design a payment gateway..." }]
      }
    }
  }'

# Subscribe to SSE stream for live progress
curl -N -X POST https://archmesh.example.com/a2a \
  -H "Authorization: Bearer $ARCHMESH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tasks/sendSubscribe",
    "params": {
      "id": "task-002",
      "message": {
        "parts": [{ "type": "text", "text": "Design a payment gateway..." }]
      }
    }
  }'

# Check task status
curl -X POST https://archmesh.example.com/a2a \
  -H "Authorization: Bearer $ARCHMESH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":3,"method":"tasks/get","params":{"id":"task-001"}}'

Task lifecycle

submitted → working → completed
                    ↘ failed
                    ↘ canceled

Auth schemes

Scheme	Header	Example
`none`	—	Open endpoint
`bearer`	`Authorization: Bearer <token>`	`--auth bearer --token mysecret`
`api_key`	`X-API-Key: <key>`	`--auth api_key --token mykey`

Cost Controls

Per-role budget caps — each role has a hard cap (architects $0.40, critics/economist $0.12, judges $0.15, rebuttals $0.08)
Global --budget cap — hard stop across all rounds
--dry-run — shows worst-case cost estimate before spending anything
usage.json — persisted per-role cost breakdown after every run
Agents that exceed their cap are skipped gracefully via Promise.allSettled — other agents continue

# Preview costs without calling any agents
archmesh run brief.md --repo ./myproject --dry-run \
  --config archmesh.myproject.config.json

At the end of every run, a cost table is printed to the console and usage.json is saved:

=== Usage Summary ===
  round 1 / architect_a          $0.1823 (12450 tokens)
  round 1 / architect_b          $0.2104 (14230 tokens)
  round 1 / critic               $0.0612 (8100 tokens)
  round 1 / economist            $0.0891 (9800 tokens)
  round 2 / rebuttal-architect_a $0.0421 (5200 tokens)
  round 2 / rebuttal-architect_b $0.0387 (4900 tokens)
  round 3 / judge (×3)           $0.1203 (15600 tokens)
  total                          $0.7441

How dry-run computes the estimate

The --dry-run ceiling is the static sum of configured per-role hard caps, clipped by --budget:

worst-case = architect_a($0.40) + architect_b($0.40)
           + critic($0.12)      + economist($0.12)
           + judges(N × $0.15)  + rebuttals(2 × $0.08)
effective  = min(worst-case, --budget)

This is not a token-based prediction and does not use historical usage envelopes or model-specific pricing. It is a hard ceiling derived purely from role configuration. Actual spend is typically 40–70% of the ceiling depending on repo size and brief complexity. Use usage.json from a prior run on the same project to calibrate expectations.

Safety Guarantees and Non-Guarantees

Guaranteed	Not guaranteed
Destructive tools (`Write`, `Edit`, `Bash`, `WebSearch`) are blocked on the Claude adapter in every profile	Tool restrictions on OpenCode / Copilot / Codex are prompt hints, not SDK-enforced
Each role's per-role cap cannot be exceeded on the Claude adapter	A non-Claude adapter that ignores the tool policy can still read arbitrary files
A role exceeding its cap is skipped — other roles continue	archmesh cannot prevent a model from reasoning about content received in a prior turn
`--dry-run` exits before any API call is made	The dry-run ceiling is a cap sum, not a model-specific token prediction
All agent outputs are Zod-validated before scoring	Schema validation rejects malformed structure; it does not detect hallucinated content
`state.json` enables resuming after any failure	Resumed runs re-apply original per-role caps; remaining global budget is not recalculated

Environment Variables

Variable	Default	Description
`ARCHMESH_BUDGET_LIMIT`	`20`	Default budget cap in USD
`ARCHMESH_DEFAULT_TOPOLOGY`	`tournament`	Default rebuttal topology

Copy .env.example to .env to configure.

Development

npm run check        # Type-check without emitting
npm test             # Run all tests (150 tests, 30 suites)
npm run build        # Compile TypeScript → dist/
npm run dev -- run brief.md --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514
npm start -- run --config archmesh.myproject.config.json

Run a single test file:

npx tsx --test src/eval/scoring.eval.test.ts
npx tsx --test src/connectors.test.ts

Project Structure

src/
├── cli.ts                        # CLI entry point (Commander)
├── index.ts                      # Public API re-exports
├── config.ts                     # Environment config
├── types.ts                      # Shared TypeScript types
├── budget.ts                     # Per-role budget tracking
├── rubric.ts                     # Weighted scoring rubric
├── topology.ts                   # Rebuttal pair generation
├── policies.ts                   # Tool policy profiles (agentic/analysis/strict)
├── skills.ts                     # Skills pack loader
├── connectors.ts                 # MCP connector registry (schema, loader, role filter)
├── adapters/
│   ├── claude.ts                 # Claude Code SDK adapter
│   ├── copilot.ts                # GitHub Copilot SDK adapter
│   ├── opencode.ts               # OpenCode SDK adapter
│   └── codex.ts                  # Codex (delegates to OpenCode)
├── orchestration/
│   ├── runMesh.ts                # Main 3-round orchestrator
│   ├── persist.ts                # Artifact persistence (rounds, usage, decision)
│   └── resume.ts                 # Run resume logic
├── prompts/
│   ├── architect.ts              # Proposal prompt builder
│   ├── critic.ts                 # Critique prompt builder
│   ├── rebuttal.ts               # Rebuttal prompt builder
│   └── judge.ts                  # Judgment prompt builder
├── schemas/
│   ├── proposal.schema.ts        # Zod schema for proposals
│   ├── critique.schema.ts        # Zod schema for critiques
│   ├── rebuttal.schema.ts        # Zod schema for rebuttals
│   ├── judgment.schema.ts        # Zod schema for judge scores
│   └── examples.ts               # JSON examples for prompt schema blocks
├── a2a/
│   ├── card.ts                   # Agent Card builder
│   ├── server.ts                 # Hono HTTP + SSE server (A2A JSON-RPC)
│   ├── handler.ts                # TaskStore and submitTask (background mesh run)
│   ├── auth.ts                   # Auth middleware (bearer, api_key, none)
│   └── types.ts                  # A2A protocol types
├── eval/
│   ├── scoring.eval.test.ts      # Score-band regression tests
│   └── prompt.drift.test.ts      # Prompt structure invariant tests
├── __fixtures__/
│   ├── data.ts                   # Fixture proposals, critiques, judge scores
│   └── briefs/
│       ├── payment-gateway.md    # Benchmark brief: PCI-DSS payment processing
│       └── data-pipeline.md      # Benchmark brief: real-time analytics pipeline
└── utils/
    ├── logger.ts                 # Logging with progress streaming
    ├── scoring.ts                # Ensemble judgment computation
    ├── templates.ts              # Decision markdown renderer
    └── files.ts                  # File read/write helpers

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.env.example		.env.example
.gitignore		.gitignore
.npmignore		.npmignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SOLUTION_PLAN.md		SOLUTION_PLAN.md
archmesh — Architecture Review Mesh Full Design Document.md		archmesh — Architecture Review Mesh Full Design Document.md
archmesh-diagram.jpeg		archmesh-diagram.jpeg
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

archmesh

Why archmesh?

Why not Argue / LangGraph / AutoGen?

Killer demo

How It Works

Control plane

Prerequisites

Install

Quick Start

1. Run the setup wizard

2. Write an architecture brief

3. List available models

4. Run a review

5. Review the output

CLI Reference

archmesh init

archmesh run <briefPath>

archmesh resume <runDir>

archmesh serve

archmesh a2a-card

archmesh models [backend]

archmesh config

Backends

Topologies

Rubric

Tool Policy Profiles

Adapter safety model

Policy in action — blocked repo scan

Extensions

Skills Packs

MCP Connectors

Registry format

Using connectors

A2A Server

Starting the server

Submitting a task

Task lifecycle

Auth schemes

Cost Controls

How dry-run computes the estimate

Safety Guarantees and Non-Guarantees

Environment Variables

Development

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`archmesh init`

`archmesh run <briefPath>`

`archmesh resume <runDir>`

`archmesh serve`

`archmesh a2a-card`

`archmesh models [backend]`

`archmesh config`

Packages