Multi-agent architecture review with hard cost contracts, tool guardrails, and deterministic stop conditions.
Most multi-agent orchestrators treat cost and tool safety as your problem. A critic backed by Claude CLI can silently read your entire repository on every round unless something explicitly stops it — and none of them will tell you the bill before it arrives. archmesh is the trust layer that prevents this.
Two AI architects propose competing solutions. Critics and an economist challenge them. Architects rebut each other. A judge ensemble scores everything on a weighted rubric and picks a winner — or synthesizes a hybrid. Every role runs under a hard USD cap with a tool policy that controls what it can touch.
Harness-agnostic debate engines like argue are elegant in theory: one delegate interface, any backend. In practice they hand off all cost and safety responsibility to you. archmesh is opinionated in the places that matter:
| What | How |
|---|---|
| Hard per-role budget caps | Each role has a ceiling (architects $0.40, critics $0.12, judges $0.15). Overages skip the agent gracefully — they don't abort the run. |
| Tool policy profiles | agentic architects get Read/Glob/Grep. analysis critics get no file tools. strict judges get nothing. Destructive tools (Write, Edit, Bash, WebSearch) are blocked in every profile. |
| Dry-run mode | Compute worst-case spend and exit before any agent is called. |
| Full cost audit | usage.json with per-role token counts and USD after every run. |
| Graceful degradation | One failing or over-budget agent never kills the whole mesh. |
| Argue | LangGraph / AutoGen | archmesh | |
|---|---|---|---|
| Hard per-role USD caps | ✗ delegate responsibility | Partial (callbacks) | ✓ enforced before each call |
| Tool allowlist / blocklist | ✗ | ✗ | ✓ per-role profiles |
| Dry-run cost estimate | ✗ | ✗ | ✓ static cap sum, exits before any API call |
| Repo-scan prevention | ✗ | ✗ | ✓ non-agentic roles blocked from file tools |
| Graceful degradation | ✗ | Partial | ✓ Promise.allSettled across all roles |
| Debate topology control | ✓ | ✗ | ✓ ring / tournament / full |
| Structured schema validation | ✗ | Partial | ✓ Zod-validated every round |
| Resume interrupted runs | ✗ | ✗ | ✓ state.json checkpoint |
# Preview costs before spending a single token
archmesh run brief.md --repo ./myproject --dry-run \
--model-architect-a claude-sonnet-4-20250514 \
--model-architect-b claude-sonnet-4-20250514 \
--model-critic claude-sonnet-4-20250514 \
--model-economist claude-sonnet-4-20250514 \
--model-judge claude-sonnet-4-20250514
# Output:
# === Agent Configuration ===
# repo: /Users/you/myproject
# architect_a: claude → claude-sonnet-4-20250514 (cap $0.40)
# architect_b: claude → claude-sonnet-4-20250514 (cap $0.40)
# critic: claude → claude-sonnet-4-20250514 (cap $0.12)
# economist: claude → claude-sonnet-4-20250514 (cap $0.12)
# judge_1: claude → claude-sonnet-4-20250514 (cap $0.15)
# rebuttals (×2): max $0.16
# Total worst-case: $1.35 (capped by --budget 5)
# Effective cap: $1.35
# Dry run — exiting without calling any agents.
# Run for real with a $2 global cap, critics restricted to prompt-only (no repo reads)
archmesh run brief.md --repo ./myproject --budget 2 \
--policy-critic analysis \
--policy-economist analysis \
--model-architect-a claude-sonnet-4-20250514 \
--model-architect-b claude-sonnet-4-20250514 \
--model-critic claude-sonnet-4-20250514 \
--model-economist claude-sonnet-4-20250514 \
--model-judge claude-sonnet-4-20250514Round 1 — Proposals & Critiques (parallel)
architect_a ──► proposal A
architect_b ──► proposal B
critic ──► critique of both proposals
economist ──► cost-focused critique of both proposals
Round 2 — Rebuttals (topology-driven)
Each architect rebuts the other's proposal
Critics may also rebut architects depending on topology
Round 3 — Judgment (ensemble)
N judges independently score all artifacts on a weighted rubric
Weighted averages produce the final decision
Output is a decision.md with the winning architecture, rubric scores, dissent notes, next actions, and a usage.json with per-role costs.
brief.md + target repo
↓
role prompt builders ← skills injected (--skills-dir)
← connectors appended (--connectors)
↓
adapter calls ← tool policy enforced (allowedTools / disallowedTools)
← per-role USD cap checked before each call
↓
Zod schema validation ← malformed output skipped, run continues
↓
ensemble scoring ← weighted rubric applied; judges never see totals
↓
persisted artifacts → decision.md · usage.json · state.json
archmesh orchestrates locally-installed AI coding tools via their TypeScript SDKs. Install the ones you want to use:
| Backend | Install | Auth |
|---|---|---|
| Claude | npm i -g @anthropic-ai/claude-code |
Claude Pro/Max subscription |
| GitHub Copilot | Bundled with @github/copilot-sdk |
GitHub Copilot subscription |
| OpenCode | npm i -g opencode |
Configure providers in ~/.config/opencode/config.json |
| Codex | Uses OpenCode with OpenAI models | OpenAI API key via OpenCode config |
You need at least one backend installed. Claude is the default for most roles.
npm install -g archmeshOr use without installing:
npx archmesh <command>The fastest way to get started — detects your installed backends and generates the full CLI command:
archmesh initThe wizard will:
- Auto-detect which backends are installed and authenticated
- Let you pick Quick mode (one model for everything) or Advanced mode (per-role)
- Show available models from each backend's SDK
- Walk you through topology, judge count, and budget
- Generate the full
archmesh runcommand - Optionally save a config file and/or run immediately
Create a markdown file describing what you want reviewed. See src/__fixtures__/briefs/ for examples.
A good brief includes:
- Context — what the system does, current stack
- What to review — the specific subsystem or decision
- Constraints — budget, team size, cloud provider, latency targets
- Questions — specific tradeoffs you want evaluated
archmesh models # all backends
archmesh models claude # just Claude
archmesh models opencode # just OpenCodePoint --repo at the project you want the agents to analyze. Model flags are required unless you use --config.
# All Claude
archmesh run my-brief.md \
--repo /path/to/project \
--model-architect-a claude-sonnet-4-20250514 \
--model-architect-b claude-sonnet-4-20250514 \
--model-critic claude-sonnet-4-20250514 \
--model-economist claude-sonnet-4-20250514 \
--model-judge claude-sonnet-4-20250514
# Mix backends for adversarial diversity
archmesh run my-brief.md \
--repo /path/to/project \
--architect-a claude --model-architect-a claude-sonnet-4-20250514 \
--architect-b copilot --model-architect-b gpt-4.1 \
--critic opencode --model-critic anthropic/claude-sonnet-4-20250514 \
--economist claude --model-economist claude-sonnet-4-20250514 \
--judge claude --model-judge claude-sonnet-4-20250514 \
--topology ring \
--judges 3 \
--budget 10
# Use a config file from `archmesh init`
archmesh run my-brief.md \
--repo /path/to/project \
--config archmesh.myproject.config.jsonResults are saved to runs/<timestamp>/:
runs/2026-04-15T10-30-00-000Z/
├── round1/
│ ├── architect_a.json # Proposal A
│ ├── architect_b.json # Proposal B
│ ├── critic.json # Critic's critique
│ └── economist.json # Economist's critique
├── round2/
│ └── rebuttals.json # All rebuttals
├── round3/
│ ├── scores.json # Raw per-judge scores
│ ├── judgment.json # Ensemble judgment
│ └── decision.md # Human-readable final decision
├── usage.json # Per-role cost breakdown
└── state.json # Run state (for resume)
Interactive setup wizard. Detects installed backends, queries their model lists, and outputs a ready-to-run command or config file.
archmesh initModes:
- Quick — one backend and model for all roles
- Advanced — different backend and model per role
Run a full three-round architecture review.
| Flag | Default | Description |
|---|---|---|
--repo <path> |
required | Path to the target repository to analyze |
--config <path> |
— | Config file from archmesh init — provides all params; no other flags needed when used |
--architect-a <backend> |
claude |
Backend for architect A |
--architect-b <backend> |
claude |
Backend for architect B |
--critic <backend> |
opencode |
Backend for the critic |
--economist <backend> |
claude |
Backend for the economist |
--judge <backend> |
claude |
Backend for the judge(s) |
--judges <count> |
3 |
Number of judge instances (ensemble) |
--model-architect-a <model> |
— | Model for architect A (required if no --config) |
--model-architect-b <model> |
— | Model for architect B (required if no --config) |
--model-critic <model> |
— | Model for the critic (required if no --config) |
--model-economist <model> |
— | Model for the economist (required if no --config) |
--model-judge <model> |
— | Model for the judge(s) (required if no --config) |
--topology <type> |
tournament |
Rebuttal topology: ring, tournament, or full |
--budget <usd> |
3 |
Maximum total spend in USD |
--interactive |
false |
Pause after Round 1 for human review before proceeding |
--dry-run |
false |
Show worst-case cost estimate and exit without calling agents |
--policy-architect-a <profile> |
agentic |
Tool policy for architect A: agentic, analysis, or strict |
--policy-architect-b <profile> |
agentic |
Tool policy for architect B |
--policy-critic <profile> |
analysis |
Tool policy for the critic |
--policy-economist <profile> |
analysis |
Tool policy for the economist |
--policy-judge <profile> |
strict |
Tool policy for judges |
--skills-dir <path> |
— | Directory of role skill files (see Skills Packs) |
--connectors <path> |
— | JSON file defining MCP server connections (see MCP Connectors) |
Resume a previously interrupted run from its saved state. Reads state.json from the run directory and continues from where it left off.
| Flag | Default | Description |
|---|---|---|
--repo <path> |
required | Path to the target repository |
--budget <usd> |
10 |
Remaining budget cap |
--topology <type> |
tournament |
Rebuttal topology (if resuming before Round 2) |
--architect-a <backend> |
claude |
Backend for architect A |
--architect-b <backend> |
claude |
Backend for architect B |
--critic <backend> |
opencode |
Backend for critic |
--economist <backend> |
claude |
Backend for economist |
--judge <backend> |
claude |
Backend for judge |
--judges <count> |
3 |
Number of judges |
--model-architect-a <model> |
required | Model for architect A |
--model-architect-b <model> |
required | Model for architect B |
--model-critic <model> |
required | Model for critic |
--model-economist <model> |
required | Model for economist |
--model-judge <model> |
required | Model for judge |
Start an A2A-compliant HTTP server that accepts architecture review tasks from other agents or automation pipelines.
archmesh serve \
--repo /path/to/project \
--model-architect-a claude-sonnet-4-20250514 \
--model-architect-b claude-sonnet-4-20250514 \
--model-critic claude-sonnet-4-20250514 \
--model-economist claude-sonnet-4-20250514 \
--model-judge claude-sonnet-4-20250514 \
--port 8787 \
--auth bearer \
--token mysecret| Flag | Default | Description |
|---|---|---|
--repo <path> |
required | Default repository path for mesh runs |
--model-architect-a <model> |
required | Model for architect A |
--model-architect-b <model> |
required | Model for architect B |
--model-critic <model> |
required | Model for critic |
--model-economist <model> |
required | Model for economist |
--model-judge <model> |
required | Model for judge |
--port <port> |
8787 |
HTTP port to listen on |
--base-url <url> |
http://localhost:8787 |
Public base URL (used in Agent Card) |
--auth <scheme> |
none |
Auth scheme: none, bearer, api_key |
--token <secret> |
— | Bearer token or API key (required when --auth != none) |
--topology <type> |
tournament |
Rebuttal topology |
--budget <usd> |
3 |
Per-task budget cap |
--judges <count> |
3 |
Number of judge instances |
--config <path> |
— | Config file for role backends/models |
Exposes two endpoints:
GET /.well-known/agent.json— A2A Agent Card for discoveryPOST /a2a— JSON-RPC 2.0 dispatcher
Supported JSON-RPC methods: tasks/send, tasks/sendSubscribe (SSE), tasks/get, tasks/cancel.
Generate an A2A Agent Card JSON for this archmesh instance without starting a server.
archmesh a2a-card --base-url https://archmesh.example.com --auth bearer
# Write to file
archmesh a2a-card --base-url https://archmesh.example.com --out agent.json| Flag | Default | Description |
|---|---|---|
--base-url <url> |
http://localhost:8787 |
Public base URL for this instance |
--auth <scheme> |
none |
Auth scheme: none, bearer, oauth2, api_key |
--out <path> |
— | Write JSON to file instead of stdout |
List available models for one or all backends. Queries each SDK at runtime.
archmesh models # all backends
archmesh models claude
archmesh models opencode
archmesh models copilot
archmesh models codexShow environment configuration and example model names per backend.
archmesh configEach agent role can use a different backend. Mix them for adversarial diversity — different models with different training data produce more independent proposals.
| Backend | Description |
|---|---|
claude |
Anthropic Claude via the Claude Code SDK. Best overall reasoning. Uses your local claude CLI auth. |
copilot |
GitHub Copilot via @github/copilot-sdk. Good for code-aware analysis. Uses your GitHub auth. |
opencode |
OpenCode SDK. Supports multiple providers (Anthropic, OpenAI, Google, etc.) via its own config. |
codex |
OpenCode SDK configured with OpenAI's Codex models. |
Every --model-* flag is required — there are no defaults. Use archmesh models to discover what's available on your machine.
Controls who rebuts whom in Round 2:
| Topology | Pairs | Description |
|---|---|---|
ring |
2 | Each architect rebuts the other only |
tournament |
6 | Critics also rebut architects (default) |
full |
N×(N-1) | All participants rebut all others |
Judges score both architect proposals on 8 weighted criteria (0–10 per criterion). The orchestrator computes weighted totals — judges never see or compute totals themselves.
| Criterion | Weight | What it measures |
|---|---|---|
| Constraint fit | 22% | How well the proposal addresses stated constraints |
| Operational simplicity | 18% | Ease of running in production |
| Delivery speed | 14% | Time to working system |
| Scalability | 14% | Ability to handle growth |
| Cost predictability | 12% | Budget certainty and TCO |
| Security & compliance | 10% | Security posture and compliance gaps |
| Vendor lock-in risk | 5% | Dependency on proprietary services |
| Evolvability (24 months) | 5% | Ability to adapt to changing requirements |
Each agent role has a default tool policy that controls what it can do during its Claude SDK session. Policies are enforced across all adapters.
| Profile | Default roles | Permitted tools |
|---|---|---|
agentic |
architect_a, architect_b |
Read, Glob, Grep (read-only repo exploration) |
analysis |
critic, economist, rebuttal |
No file tools — works from prompt context only |
strict |
judge |
No tools at all — pure structured completion |
All profiles hard-block destructive tools (Write, Edit, Bash, WebSearch, etc.) regardless of setting.
Override defaults per role with --policy-* flags:
archmesh run brief.md --repo /project \
--policy-architect-a analysis \ # restrict architects to prompt-only
--policy-judge agentic \ # allow judges to read files
...Enforcement strength varies by backend. Pair high-risk roles with Claude for the strongest guarantees.
| Adapter | Enforcement mechanism | Strength |
|---|---|---|
claude |
allowedTools / disallowedTools passed to the Claude Agent SDK. Enforced at API level — the model cannot invoke a blocked tool regardless of prompt content. |
Hard |
copilot |
allowExploration flag controls whether tool-calling context is injected into the system prompt. No SDK-level enforcement. |
Soft |
opencode |
tools override map passed to the OpenCode SDK. Enforcement depends on the underlying provider and model. |
Best-effort |
codex |
Same as OpenCode. | Best-effort |
MCP connectors are only active on the Claude adapter. Other adapters receive the registry but MCP injection is a no-op.
archmesh run brief.md --repo ./myproject \
--policy-critic analysis \
--model-architect-a claude-sonnet-4-20250514 \
--model-critic claude-sonnet-4-20250514 ...
# The critic is backed by Claude with disallowedTools: ["Task","Read","Glob","Grep"].
# Even if the prompt asks to read files, the SDK rejects the tool call.
# If the critic also exceeds its $0.12 cap it is skipped via Promise.allSettled:
#
# ⚠ critic skipped (budget exceeded: $0.12)
# ✓ economist completed $0.09
#
# The run continues to Round 2 and Round 3 with available critiques.The following features layer on top of the core review mesh. All are optional — a basic archmesh run needs none of them.
Inject domain-specific expertise into any agent role by creating markdown files in a skills directory. Skills are appended to the role's base prompt under a ROLE SKILLS: section.
# Create a skills directory
mkdir my-skills
# Write domain expertise for each role
cat > my-skills/architect_a.md << 'EOF'
- Prefer Kubernetes-native solutions (Helm, Kustomize, Argo CD)
- All services must expose /health and /metrics endpoints
- mTLS required between all internal services
EOF
cat > my-skills/economist.md << 'EOF'
- Model costs over 24 months including engineering time at $150k/yr fully loaded
- Flag any solution requiring a dedicated platform engineer
- Prefer committed-use pricing where multi-year commitment risk is acceptable
EOF
# Pass the directory at runtime
archmesh run brief.md --repo /project --skills-dir ./my-skills ...Supported skill files (one per role, all optional):
| File | Role |
|---|---|
architect_a.md |
Architect A proposals |
architect_b.md |
Architect B proposals |
critic.md |
Critic critiques |
economist.md |
Economist critiques |
rebuttal.md |
Rebuttal round (all agents) |
judge.md |
Judge scoring |
Missing files fall back silently to the base prompt. Files exceeding 8,000 characters are truncated with a warning.
Give agents access to external MCP (Model Context Protocol) servers — documentation search, internal APIs, issue trackers, etc. — via a JSON registry file.
[
{
"name": "github",
"transport": "stdio",
"command": "mcp-github",
"args": ["--token", "ghp_xxx"],
"roles": ["architect_a", "architect_b"]
},
{
"name": "docs",
"transport": "http",
"url": "http://localhost:3001",
"headers": { "Authorization": "Bearer token" }
},
{
"name": "events",
"transport": "sse",
"url": "http://localhost:4000/sse",
"roles": ["critic"]
}
]Transport types:
| Transport | Required fields | Optional fields |
|---|---|---|
stdio |
command |
args, env |
http |
url |
headers |
sse |
url |
headers |
roles field: When omitted, the connector is available to all roles. When specified, only agents whose role name appears in the list receive the connector.
archmesh run brief.md \
--repo /project \
--connectors ./connectors.json \
...Connectors are currently wired into the Claude adapter only (the Claude Agent SDK supports mcpServers natively). Other adapters receive the full registry but MCP injection is a no-op — they operate on prompt context only.
archmesh exposes itself as an Agent2Agent (A2A) compatible service, allowing other agents or orchestration systems to invoke architecture reviews programmatically.
archmesh serve \
--repo /path/to/project \
--model-architect-a claude-sonnet-4-20250514 \
--model-architect-b claude-sonnet-4-20250514 \
--model-critic claude-sonnet-4-20250514 \
--model-economist claude-sonnet-4-20250514 \
--model-judge claude-sonnet-4-20250514 \
--port 8787 \
--base-url https://archmesh.example.com \
--auth bearer \
--token $ARCHMESH_TOKEN# Discover the agent card
curl https://archmesh.example.com/.well-known/agent.json
# Submit a task (fire-and-forget)
curl -X POST https://archmesh.example.com/a2a \
-H "Authorization: Bearer $ARCHMESH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tasks/send",
"params": {
"id": "task-001",
"message": {
"parts": [{ "type": "text", "text": "Design a payment gateway..." }]
}
}
}'
# Subscribe to SSE stream for live progress
curl -N -X POST https://archmesh.example.com/a2a \
-H "Authorization: Bearer $ARCHMESH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 2,
"method": "tasks/sendSubscribe",
"params": {
"id": "task-002",
"message": {
"parts": [{ "type": "text", "text": "Design a payment gateway..." }]
}
}
}'
# Check task status
curl -X POST https://archmesh.example.com/a2a \
-H "Authorization: Bearer $ARCHMESH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":3,"method":"tasks/get","params":{"id":"task-001"}}'submitted → working → completed
↘ failed
↘ canceled
| Scheme | Header | Example |
|---|---|---|
none |
— | Open endpoint |
bearer |
Authorization: Bearer <token> |
--auth bearer --token mysecret |
api_key |
X-API-Key: <key> |
--auth api_key --token mykey |
- Per-role budget caps — each role has a hard cap (architects $0.40, critics/economist $0.12, judges $0.15, rebuttals $0.08)
- Global
--budgetcap — hard stop across all rounds --dry-run— shows worst-case cost estimate before spending anythingusage.json— persisted per-role cost breakdown after every run- Agents that exceed their cap are skipped gracefully via
Promise.allSettled— other agents continue
# Preview costs without calling any agents
archmesh run brief.md --repo ./myproject --dry-run \
--config archmesh.myproject.config.jsonAt the end of every run, a cost table is printed to the console and usage.json is saved:
=== Usage Summary ===
round 1 / architect_a $0.1823 (12450 tokens)
round 1 / architect_b $0.2104 (14230 tokens)
round 1 / critic $0.0612 (8100 tokens)
round 1 / economist $0.0891 (9800 tokens)
round 2 / rebuttal-architect_a $0.0421 (5200 tokens)
round 2 / rebuttal-architect_b $0.0387 (4900 tokens)
round 3 / judge (×3) $0.1203 (15600 tokens)
total $0.7441
The --dry-run ceiling is the static sum of configured per-role hard caps, clipped by --budget:
worst-case = architect_a($0.40) + architect_b($0.40)
+ critic($0.12) + economist($0.12)
+ judges(N × $0.15) + rebuttals(2 × $0.08)
effective = min(worst-case, --budget)
This is not a token-based prediction and does not use historical usage envelopes or model-specific pricing. It is a hard ceiling derived purely from role configuration. Actual spend is typically 40–70% of the ceiling depending on repo size and brief complexity. Use usage.json from a prior run on the same project to calibrate expectations.
| Guaranteed | Not guaranteed |
|---|---|
Destructive tools (Write, Edit, Bash, WebSearch) are blocked on the Claude adapter in every profile |
Tool restrictions on OpenCode / Copilot / Codex are prompt hints, not SDK-enforced |
| Each role's per-role cap cannot be exceeded on the Claude adapter | A non-Claude adapter that ignores the tool policy can still read arbitrary files |
| A role exceeding its cap is skipped — other roles continue | archmesh cannot prevent a model from reasoning about content received in a prior turn |
--dry-run exits before any API call is made |
The dry-run ceiling is a cap sum, not a model-specific token prediction |
| All agent outputs are Zod-validated before scoring | Schema validation rejects malformed structure; it does not detect hallucinated content |
state.json enables resuming after any failure |
Resumed runs re-apply original per-role caps; remaining global budget is not recalculated |
| Variable | Default | Description |
|---|---|---|
ARCHMESH_BUDGET_LIMIT |
20 |
Default budget cap in USD |
ARCHMESH_DEFAULT_TOPOLOGY |
tournament |
Default rebuttal topology |
Copy .env.example to .env to configure.
npm run check # Type-check without emitting
npm test # Run all tests (150 tests, 30 suites)
npm run build # Compile TypeScript → dist/
npm run dev -- run brief.md --repo /path/to/project \
--model-architect-a claude-sonnet-4-20250514 \
--model-architect-b claude-sonnet-4-20250514 \
--model-critic claude-sonnet-4-20250514 \
--model-economist claude-sonnet-4-20250514 \
--model-judge claude-sonnet-4-20250514
npm start -- run --config archmesh.myproject.config.jsonRun a single test file:
npx tsx --test src/eval/scoring.eval.test.ts
npx tsx --test src/connectors.test.tssrc/
├── cli.ts # CLI entry point (Commander)
├── index.ts # Public API re-exports
├── config.ts # Environment config
├── types.ts # Shared TypeScript types
├── budget.ts # Per-role budget tracking
├── rubric.ts # Weighted scoring rubric
├── topology.ts # Rebuttal pair generation
├── policies.ts # Tool policy profiles (agentic/analysis/strict)
├── skills.ts # Skills pack loader
├── connectors.ts # MCP connector registry (schema, loader, role filter)
├── adapters/
│ ├── claude.ts # Claude Code SDK adapter
│ ├── copilot.ts # GitHub Copilot SDK adapter
│ ├── opencode.ts # OpenCode SDK adapter
│ └── codex.ts # Codex (delegates to OpenCode)
├── orchestration/
│ ├── runMesh.ts # Main 3-round orchestrator
│ ├── persist.ts # Artifact persistence (rounds, usage, decision)
│ └── resume.ts # Run resume logic
├── prompts/
│ ├── architect.ts # Proposal prompt builder
│ ├── critic.ts # Critique prompt builder
│ ├── rebuttal.ts # Rebuttal prompt builder
│ └── judge.ts # Judgment prompt builder
├── schemas/
│ ├── proposal.schema.ts # Zod schema for proposals
│ ├── critique.schema.ts # Zod schema for critiques
│ ├── rebuttal.schema.ts # Zod schema for rebuttals
│ ├── judgment.schema.ts # Zod schema for judge scores
│ └── examples.ts # JSON examples for prompt schema blocks
├── a2a/
│ ├── card.ts # Agent Card builder
│ ├── server.ts # Hono HTTP + SSE server (A2A JSON-RPC)
│ ├── handler.ts # TaskStore and submitTask (background mesh run)
│ ├── auth.ts # Auth middleware (bearer, api_key, none)
│ └── types.ts # A2A protocol types
├── eval/
│ ├── scoring.eval.test.ts # Score-band regression tests
│ └── prompt.drift.test.ts # Prompt structure invariant tests
├── __fixtures__/
│ ├── data.ts # Fixture proposals, critiques, judge scores
│ └── briefs/
│ ├── payment-gateway.md # Benchmark brief: PCI-DSS payment processing
│ └── data-pipeline.md # Benchmark brief: real-time analytics pipeline
└── utils/
├── logger.ts # Logging with progress streaming
├── scoring.ts # Ensemble judgment computation
├── templates.ts # Decision markdown renderer
└── files.ts # File read/write helpers
MIT
