Skip to content

KBoudich/archimesh

Repository files navigation

archmesh

Multi-agent architecture review with hard cost contracts, tool guardrails, and deterministic stop conditions.

Most multi-agent orchestrators treat cost and tool safety as your problem. A critic backed by Claude CLI can silently read your entire repository on every round unless something explicitly stops it — and none of them will tell you the bill before it arrives. archmesh is the trust layer that prevents this.

Two AI architects propose competing solutions. Critics and an economist challenge them. Architects rebut each other. A judge ensemble scores everything on a weighted rubric and picks a winner — or synthesizes a hybrid. Every role runs under a hard USD cap with a tool policy that controls what it can touch.

Why archmesh?

Harness-agnostic debate engines like argue are elegant in theory: one delegate interface, any backend. In practice they hand off all cost and safety responsibility to you. archmesh is opinionated in the places that matter:

What How
Hard per-role budget caps Each role has a ceiling (architects $0.40, critics $0.12, judges $0.15). Overages skip the agent gracefully — they don't abort the run.
Tool policy profiles agentic architects get Read/Glob/Grep. analysis critics get no file tools. strict judges get nothing. Destructive tools (Write, Edit, Bash, WebSearch) are blocked in every profile.
Dry-run mode Compute worst-case spend and exit before any agent is called.
Full cost audit usage.json with per-role token counts and USD after every run.
Graceful degradation One failing or over-budget agent never kills the whole mesh.

Why not Argue / LangGraph / AutoGen?

Argue LangGraph / AutoGen archmesh
Hard per-role USD caps ✗ delegate responsibility Partial (callbacks) ✓ enforced before each call
Tool allowlist / blocklist ✓ per-role profiles
Dry-run cost estimate ✓ static cap sum, exits before any API call
Repo-scan prevention ✓ non-agentic roles blocked from file tools
Graceful degradation Partial Promise.allSettled across all roles
Debate topology control ring / tournament / full
Structured schema validation Partial ✓ Zod-validated every round
Resume interrupted runs state.json checkpoint

Killer demo

# Preview costs before spending a single token
archmesh run brief.md --repo ./myproject --dry-run \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514

# Output:
# === Agent Configuration ===
#   repo: /Users/you/myproject
#   architect_a:  claude → claude-sonnet-4-20250514 (cap $0.40)
#   architect_b:  claude → claude-sonnet-4-20250514 (cap $0.40)
#   critic:       claude → claude-sonnet-4-20250514 (cap $0.12)
#   economist:    claude → claude-sonnet-4-20250514 (cap $0.12)
#   judge_1:      claude → claude-sonnet-4-20250514 (cap $0.15)
#   rebuttals (×2): max $0.16
#   Total worst-case: $1.35 (capped by --budget 5)
#   Effective cap: $1.35
# Dry run — exiting without calling any agents.

# Run for real with a $2 global cap, critics restricted to prompt-only (no repo reads)
archmesh run brief.md --repo ./myproject --budget 2 \
  --policy-critic analysis \
  --policy-economist analysis \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514

How It Works

archmesh trust layer diagram

Round 1 — Proposals & Critiques (parallel)
  architect_a ──► proposal A
  architect_b ──► proposal B
  critic      ──► critique of both proposals
  economist   ──► cost-focused critique of both proposals

Round 2 — Rebuttals (topology-driven)
  Each architect rebuts the other's proposal
  Critics may also rebut architects depending on topology

Round 3 — Judgment (ensemble)
  N judges independently score all artifacts on a weighted rubric
  Weighted averages produce the final decision

Output is a decision.md with the winning architecture, rubric scores, dissent notes, next actions, and a usage.json with per-role costs.

Control plane

brief.md + target repo
  ↓
role prompt builders      ← skills injected (--skills-dir)
                          ← connectors appended (--connectors)
  ↓
adapter calls             ← tool policy enforced (allowedTools / disallowedTools)
                          ← per-role USD cap checked before each call
  ↓
Zod schema validation     ← malformed output skipped, run continues
  ↓
ensemble scoring          ← weighted rubric applied; judges never see totals
  ↓
persisted artifacts       → decision.md · usage.json · state.json

Prerequisites

archmesh orchestrates locally-installed AI coding tools via their TypeScript SDKs. Install the ones you want to use:

Backend Install Auth
Claude npm i -g @anthropic-ai/claude-code Claude Pro/Max subscription
GitHub Copilot Bundled with @github/copilot-sdk GitHub Copilot subscription
OpenCode npm i -g opencode Configure providers in ~/.config/opencode/config.json
Codex Uses OpenCode with OpenAI models OpenAI API key via OpenCode config

You need at least one backend installed. Claude is the default for most roles.

Install

npm install -g archmesh

Or use without installing:

npx archmesh <command>

Quick Start

1. Run the setup wizard

The fastest way to get started — detects your installed backends and generates the full CLI command:

archmesh init

The wizard will:

  • Auto-detect which backends are installed and authenticated
  • Let you pick Quick mode (one model for everything) or Advanced mode (per-role)
  • Show available models from each backend's SDK
  • Walk you through topology, judge count, and budget
  • Generate the full archmesh run command
  • Optionally save a config file and/or run immediately

2. Write an architecture brief

Create a markdown file describing what you want reviewed. See src/__fixtures__/briefs/ for examples.

A good brief includes:

  • Context — what the system does, current stack
  • What to review — the specific subsystem or decision
  • Constraints — budget, team size, cloud provider, latency targets
  • Questions — specific tradeoffs you want evaluated

3. List available models

archmesh models          # all backends
archmesh models claude   # just Claude
archmesh models opencode # just OpenCode

4. Run a review

Point --repo at the project you want the agents to analyze. Model flags are required unless you use --config.

# All Claude
archmesh run my-brief.md \
  --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514

# Mix backends for adversarial diversity
archmesh run my-brief.md \
  --repo /path/to/project \
  --architect-a claude   --model-architect-a claude-sonnet-4-20250514 \
  --architect-b copilot  --model-architect-b gpt-4.1 \
  --critic opencode      --model-critic anthropic/claude-sonnet-4-20250514 \
  --economist claude     --model-economist claude-sonnet-4-20250514 \
  --judge claude         --model-judge claude-sonnet-4-20250514 \
  --topology ring \
  --judges 3 \
  --budget 10

# Use a config file from `archmesh init`
archmesh run my-brief.md \
  --repo /path/to/project \
  --config archmesh.myproject.config.json

5. Review the output

Results are saved to runs/<timestamp>/:

runs/2026-04-15T10-30-00-000Z/
├── round1/
│   ├── architect_a.json    # Proposal A
│   ├── architect_b.json    # Proposal B
│   ├── critic.json         # Critic's critique
│   └── economist.json      # Economist's critique
├── round2/
│   └── rebuttals.json      # All rebuttals
├── round3/
│   ├── scores.json         # Raw per-judge scores
│   ├── judgment.json       # Ensemble judgment
│   └── decision.md         # Human-readable final decision
├── usage.json              # Per-role cost breakdown
└── state.json              # Run state (for resume)

CLI Reference

archmesh init

Interactive setup wizard. Detects installed backends, queries their model lists, and outputs a ready-to-run command or config file.

archmesh init

Modes:

  • Quick — one backend and model for all roles
  • Advanced — different backend and model per role

archmesh run <briefPath>

Run a full three-round architecture review.

Flag Default Description
--repo <path> required Path to the target repository to analyze
--config <path> Config file from archmesh init — provides all params; no other flags needed when used
--architect-a <backend> claude Backend for architect A
--architect-b <backend> claude Backend for architect B
--critic <backend> opencode Backend for the critic
--economist <backend> claude Backend for the economist
--judge <backend> claude Backend for the judge(s)
--judges <count> 3 Number of judge instances (ensemble)
--model-architect-a <model> Model for architect A (required if no --config)
--model-architect-b <model> Model for architect B (required if no --config)
--model-critic <model> Model for the critic (required if no --config)
--model-economist <model> Model for the economist (required if no --config)
--model-judge <model> Model for the judge(s) (required if no --config)
--topology <type> tournament Rebuttal topology: ring, tournament, or full
--budget <usd> 3 Maximum total spend in USD
--interactive false Pause after Round 1 for human review before proceeding
--dry-run false Show worst-case cost estimate and exit without calling agents
--policy-architect-a <profile> agentic Tool policy for architect A: agentic, analysis, or strict
--policy-architect-b <profile> agentic Tool policy for architect B
--policy-critic <profile> analysis Tool policy for the critic
--policy-economist <profile> analysis Tool policy for the economist
--policy-judge <profile> strict Tool policy for judges
--skills-dir <path> Directory of role skill files (see Skills Packs)
--connectors <path> JSON file defining MCP server connections (see MCP Connectors)

archmesh resume <runDir>

Resume a previously interrupted run from its saved state. Reads state.json from the run directory and continues from where it left off.

Flag Default Description
--repo <path> required Path to the target repository
--budget <usd> 10 Remaining budget cap
--topology <type> tournament Rebuttal topology (if resuming before Round 2)
--architect-a <backend> claude Backend for architect A
--architect-b <backend> claude Backend for architect B
--critic <backend> opencode Backend for critic
--economist <backend> claude Backend for economist
--judge <backend> claude Backend for judge
--judges <count> 3 Number of judges
--model-architect-a <model> required Model for architect A
--model-architect-b <model> required Model for architect B
--model-critic <model> required Model for critic
--model-economist <model> required Model for economist
--model-judge <model> required Model for judge

archmesh serve

Start an A2A-compliant HTTP server that accepts architecture review tasks from other agents or automation pipelines.

archmesh serve \
  --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514 \
  --port 8787 \
  --auth bearer \
  --token mysecret
Flag Default Description
--repo <path> required Default repository path for mesh runs
--model-architect-a <model> required Model for architect A
--model-architect-b <model> required Model for architect B
--model-critic <model> required Model for critic
--model-economist <model> required Model for economist
--model-judge <model> required Model for judge
--port <port> 8787 HTTP port to listen on
--base-url <url> http://localhost:8787 Public base URL (used in Agent Card)
--auth <scheme> none Auth scheme: none, bearer, api_key
--token <secret> Bearer token or API key (required when --auth != none)
--topology <type> tournament Rebuttal topology
--budget <usd> 3 Per-task budget cap
--judges <count> 3 Number of judge instances
--config <path> Config file for role backends/models

Exposes two endpoints:

  • GET /.well-known/agent.json — A2A Agent Card for discovery
  • POST /a2a — JSON-RPC 2.0 dispatcher

Supported JSON-RPC methods: tasks/send, tasks/sendSubscribe (SSE), tasks/get, tasks/cancel.


archmesh a2a-card

Generate an A2A Agent Card JSON for this archmesh instance without starting a server.

archmesh a2a-card --base-url https://archmesh.example.com --auth bearer

# Write to file
archmesh a2a-card --base-url https://archmesh.example.com --out agent.json
Flag Default Description
--base-url <url> http://localhost:8787 Public base URL for this instance
--auth <scheme> none Auth scheme: none, bearer, oauth2, api_key
--out <path> Write JSON to file instead of stdout

archmesh models [backend]

List available models for one or all backends. Queries each SDK at runtime.

archmesh models          # all backends
archmesh models claude
archmesh models opencode
archmesh models copilot
archmesh models codex

archmesh config

Show environment configuration and example model names per backend.

archmesh config

Backends

Each agent role can use a different backend. Mix them for adversarial diversity — different models with different training data produce more independent proposals.

Backend Description
claude Anthropic Claude via the Claude Code SDK. Best overall reasoning. Uses your local claude CLI auth.
copilot GitHub Copilot via @github/copilot-sdk. Good for code-aware analysis. Uses your GitHub auth.
opencode OpenCode SDK. Supports multiple providers (Anthropic, OpenAI, Google, etc.) via its own config.
codex OpenCode SDK configured with OpenAI's Codex models.

Every --model-* flag is required — there are no defaults. Use archmesh models to discover what's available on your machine.


Topologies

Controls who rebuts whom in Round 2:

Topology Pairs Description
ring 2 Each architect rebuts the other only
tournament 6 Critics also rebut architects (default)
full N×(N-1) All participants rebut all others

Rubric

Judges score both architect proposals on 8 weighted criteria (0–10 per criterion). The orchestrator computes weighted totals — judges never see or compute totals themselves.

Criterion Weight What it measures
Constraint fit 22% How well the proposal addresses stated constraints
Operational simplicity 18% Ease of running in production
Delivery speed 14% Time to working system
Scalability 14% Ability to handle growth
Cost predictability 12% Budget certainty and TCO
Security & compliance 10% Security posture and compliance gaps
Vendor lock-in risk 5% Dependency on proprietary services
Evolvability (24 months) 5% Ability to adapt to changing requirements

Tool Policy Profiles

Each agent role has a default tool policy that controls what it can do during its Claude SDK session. Policies are enforced across all adapters.

Profile Default roles Permitted tools
agentic architect_a, architect_b Read, Glob, Grep (read-only repo exploration)
analysis critic, economist, rebuttal No file tools — works from prompt context only
strict judge No tools at all — pure structured completion

All profiles hard-block destructive tools (Write, Edit, Bash, WebSearch, etc.) regardless of setting.

Override defaults per role with --policy-* flags:

archmesh run brief.md --repo /project \
  --policy-architect-a analysis \   # restrict architects to prompt-only
  --policy-judge agentic \          # allow judges to read files
  ...

Adapter safety model

Enforcement strength varies by backend. Pair high-risk roles with Claude for the strongest guarantees.

Adapter Enforcement mechanism Strength
claude allowedTools / disallowedTools passed to the Claude Agent SDK. Enforced at API level — the model cannot invoke a blocked tool regardless of prompt content. Hard
copilot allowExploration flag controls whether tool-calling context is injected into the system prompt. No SDK-level enforcement. Soft
opencode tools override map passed to the OpenCode SDK. Enforcement depends on the underlying provider and model. Best-effort
codex Same as OpenCode. Best-effort

MCP connectors are only active on the Claude adapter. Other adapters receive the registry but MCP injection is a no-op.

Policy in action — blocked repo scan

archmesh run brief.md --repo ./myproject \
  --policy-critic analysis \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 ...

# The critic is backed by Claude with disallowedTools: ["Task","Read","Glob","Grep"].
# Even if the prompt asks to read files, the SDK rejects the tool call.
# If the critic also exceeds its $0.12 cap it is skipped via Promise.allSettled:
#
#   ⚠  critic skipped (budget exceeded: $0.12)
#   ✓  economist completed  $0.09
#
# The run continues to Round 2 and Round 3 with available critiques.

Extensions

The following features layer on top of the core review mesh. All are optional — a basic archmesh run needs none of them.


Skills Packs

Inject domain-specific expertise into any agent role by creating markdown files in a skills directory. Skills are appended to the role's base prompt under a ROLE SKILLS: section.

# Create a skills directory
mkdir my-skills

# Write domain expertise for each role
cat > my-skills/architect_a.md << 'EOF'
- Prefer Kubernetes-native solutions (Helm, Kustomize, Argo CD)
- All services must expose /health and /metrics endpoints
- mTLS required between all internal services
EOF

cat > my-skills/economist.md << 'EOF'
- Model costs over 24 months including engineering time at $150k/yr fully loaded
- Flag any solution requiring a dedicated platform engineer
- Prefer committed-use pricing where multi-year commitment risk is acceptable
EOF

# Pass the directory at runtime
archmesh run brief.md --repo /project --skills-dir ./my-skills ...

Supported skill files (one per role, all optional):

File Role
architect_a.md Architect A proposals
architect_b.md Architect B proposals
critic.md Critic critiques
economist.md Economist critiques
rebuttal.md Rebuttal round (all agents)
judge.md Judge scoring

Missing files fall back silently to the base prompt. Files exceeding 8,000 characters are truncated with a warning.


MCP Connectors

Give agents access to external MCP (Model Context Protocol) servers — documentation search, internal APIs, issue trackers, etc. — via a JSON registry file.

Registry format

[
  {
    "name": "github",
    "transport": "stdio",
    "command": "mcp-github",
    "args": ["--token", "ghp_xxx"],
    "roles": ["architect_a", "architect_b"]
  },
  {
    "name": "docs",
    "transport": "http",
    "url": "http://localhost:3001",
    "headers": { "Authorization": "Bearer token" }
  },
  {
    "name": "events",
    "transport": "sse",
    "url": "http://localhost:4000/sse",
    "roles": ["critic"]
  }
]

Transport types:

Transport Required fields Optional fields
stdio command args, env
http url headers
sse url headers

roles field: When omitted, the connector is available to all roles. When specified, only agents whose role name appears in the list receive the connector.

Using connectors

archmesh run brief.md \
  --repo /project \
  --connectors ./connectors.json \
  ...

Connectors are currently wired into the Claude adapter only (the Claude Agent SDK supports mcpServers natively). Other adapters receive the full registry but MCP injection is a no-op — they operate on prompt context only.


A2A Server

archmesh exposes itself as an Agent2Agent (A2A) compatible service, allowing other agents or orchestration systems to invoke architecture reviews programmatically.

Starting the server

archmesh serve \
  --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514 \
  --port 8787 \
  --base-url https://archmesh.example.com \
  --auth bearer \
  --token $ARCHMESH_TOKEN

Submitting a task

# Discover the agent card
curl https://archmesh.example.com/.well-known/agent.json

# Submit a task (fire-and-forget)
curl -X POST https://archmesh.example.com/a2a \
  -H "Authorization: Bearer $ARCHMESH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tasks/send",
    "params": {
      "id": "task-001",
      "message": {
        "parts": [{ "type": "text", "text": "Design a payment gateway..." }]
      }
    }
  }'

# Subscribe to SSE stream for live progress
curl -N -X POST https://archmesh.example.com/a2a \
  -H "Authorization: Bearer $ARCHMESH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tasks/sendSubscribe",
    "params": {
      "id": "task-002",
      "message": {
        "parts": [{ "type": "text", "text": "Design a payment gateway..." }]
      }
    }
  }'

# Check task status
curl -X POST https://archmesh.example.com/a2a \
  -H "Authorization: Bearer $ARCHMESH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":3,"method":"tasks/get","params":{"id":"task-001"}}'

Task lifecycle

submitted → working → completed
                    ↘ failed
                    ↘ canceled

Auth schemes

Scheme Header Example
none Open endpoint
bearer Authorization: Bearer <token> --auth bearer --token mysecret
api_key X-API-Key: <key> --auth api_key --token mykey

Cost Controls

  • Per-role budget caps — each role has a hard cap (architects $0.40, critics/economist $0.12, judges $0.15, rebuttals $0.08)
  • Global --budget cap — hard stop across all rounds
  • --dry-run — shows worst-case cost estimate before spending anything
  • usage.json — persisted per-role cost breakdown after every run
  • Agents that exceed their cap are skipped gracefully via Promise.allSettled — other agents continue
# Preview costs without calling any agents
archmesh run brief.md --repo ./myproject --dry-run \
  --config archmesh.myproject.config.json

At the end of every run, a cost table is printed to the console and usage.json is saved:

=== Usage Summary ===
  round 1 / architect_a          $0.1823 (12450 tokens)
  round 1 / architect_b          $0.2104 (14230 tokens)
  round 1 / critic               $0.0612 (8100 tokens)
  round 1 / economist            $0.0891 (9800 tokens)
  round 2 / rebuttal-architect_a $0.0421 (5200 tokens)
  round 2 / rebuttal-architect_b $0.0387 (4900 tokens)
  round 3 / judge (×3)           $0.1203 (15600 tokens)
  total                          $0.7441

How dry-run computes the estimate

The --dry-run ceiling is the static sum of configured per-role hard caps, clipped by --budget:

worst-case = architect_a($0.40) + architect_b($0.40)
           + critic($0.12)      + economist($0.12)
           + judges(N × $0.15)  + rebuttals(2 × $0.08)
effective  = min(worst-case, --budget)

This is not a token-based prediction and does not use historical usage envelopes or model-specific pricing. It is a hard ceiling derived purely from role configuration. Actual spend is typically 40–70% of the ceiling depending on repo size and brief complexity. Use usage.json from a prior run on the same project to calibrate expectations.


Safety Guarantees and Non-Guarantees

Guaranteed Not guaranteed
Destructive tools (Write, Edit, Bash, WebSearch) are blocked on the Claude adapter in every profile Tool restrictions on OpenCode / Copilot / Codex are prompt hints, not SDK-enforced
Each role's per-role cap cannot be exceeded on the Claude adapter A non-Claude adapter that ignores the tool policy can still read arbitrary files
A role exceeding its cap is skipped — other roles continue archmesh cannot prevent a model from reasoning about content received in a prior turn
--dry-run exits before any API call is made The dry-run ceiling is a cap sum, not a model-specific token prediction
All agent outputs are Zod-validated before scoring Schema validation rejects malformed structure; it does not detect hallucinated content
state.json enables resuming after any failure Resumed runs re-apply original per-role caps; remaining global budget is not recalculated

Environment Variables

Variable Default Description
ARCHMESH_BUDGET_LIMIT 20 Default budget cap in USD
ARCHMESH_DEFAULT_TOPOLOGY tournament Default rebuttal topology

Copy .env.example to .env to configure.


Development

npm run check        # Type-check without emitting
npm test             # Run all tests (150 tests, 30 suites)
npm run build        # Compile TypeScript → dist/
npm run dev -- run brief.md --repo /path/to/project \
  --model-architect-a claude-sonnet-4-20250514 \
  --model-architect-b claude-sonnet-4-20250514 \
  --model-critic claude-sonnet-4-20250514 \
  --model-economist claude-sonnet-4-20250514 \
  --model-judge claude-sonnet-4-20250514
npm start -- run --config archmesh.myproject.config.json

Run a single test file:

npx tsx --test src/eval/scoring.eval.test.ts
npx tsx --test src/connectors.test.ts

Project Structure

src/
├── cli.ts                        # CLI entry point (Commander)
├── index.ts                      # Public API re-exports
├── config.ts                     # Environment config
├── types.ts                      # Shared TypeScript types
├── budget.ts                     # Per-role budget tracking
├── rubric.ts                     # Weighted scoring rubric
├── topology.ts                   # Rebuttal pair generation
├── policies.ts                   # Tool policy profiles (agentic/analysis/strict)
├── skills.ts                     # Skills pack loader
├── connectors.ts                 # MCP connector registry (schema, loader, role filter)
├── adapters/
│   ├── claude.ts                 # Claude Code SDK adapter
│   ├── copilot.ts                # GitHub Copilot SDK adapter
│   ├── opencode.ts               # OpenCode SDK adapter
│   └── codex.ts                  # Codex (delegates to OpenCode)
├── orchestration/
│   ├── runMesh.ts                # Main 3-round orchestrator
│   ├── persist.ts                # Artifact persistence (rounds, usage, decision)
│   └── resume.ts                 # Run resume logic
├── prompts/
│   ├── architect.ts              # Proposal prompt builder
│   ├── critic.ts                 # Critique prompt builder
│   ├── rebuttal.ts               # Rebuttal prompt builder
│   └── judge.ts                  # Judgment prompt builder
├── schemas/
│   ├── proposal.schema.ts        # Zod schema for proposals
│   ├── critique.schema.ts        # Zod schema for critiques
│   ├── rebuttal.schema.ts        # Zod schema for rebuttals
│   ├── judgment.schema.ts        # Zod schema for judge scores
│   └── examples.ts               # JSON examples for prompt schema blocks
├── a2a/
│   ├── card.ts                   # Agent Card builder
│   ├── server.ts                 # Hono HTTP + SSE server (A2A JSON-RPC)
│   ├── handler.ts                # TaskStore and submitTask (background mesh run)
│   ├── auth.ts                   # Auth middleware (bearer, api_key, none)
│   └── types.ts                  # A2A protocol types
├── eval/
│   ├── scoring.eval.test.ts      # Score-band regression tests
│   └── prompt.drift.test.ts      # Prompt structure invariant tests
├── __fixtures__/
│   ├── data.ts                   # Fixture proposals, critiques, judge scores
│   └── briefs/
│       ├── payment-gateway.md    # Benchmark brief: PCI-DSS payment processing
│       └── data-pipeline.md      # Benchmark brief: real-time analytics pipeline
└── utils/
    ├── logger.ts                 # Logging with progress streaming
    ├── scoring.ts                # Ensemble judgment computation
    ├── templates.ts              # Decision markdown renderer
    └── files.ts                  # File read/write helpers

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors