Skip to content

DUBSOpenHub/swarm-command

Repository files navigation

🐝 Swarm Command

Multi-model consensus swarm orchestration for the Copilot CLI. Launch 50–250+ AI agents across 15 models with Shadow Score Spec L2 validation β€” from one command.

GitHub License: MIT Security Policy

Learn more and see the website here: dubsopenhub.github.io/swarm-command

⚑ One Command. That's It.

Never used the CLI before? No problem.

  1. Open your terminal
  2. Paste this:
    curl -fsSL https://raw.githubusercontent.com/DUBSOpenHub/swarm-command/main/quickstart.sh | bash
  3. When Copilot opens, type: swarm command

Requires an active Copilot subscription.


πŸš€ 30-Second Overview

Swarm Command is for tasks that are too big, risky, or cross-cutting for one model:

  • Need one answer from many perspectives? It fans your task out across a layered swarm.
  • Need confidence, not vibes? It uses cross-review + consensus scoring.
  • Need hidden quality checks? It validates bundles with sealed acceptance criteria.
  • Need speed at scale? Designed for parallel execution β€” agents work simultaneously, not sequentially.
  • Need zero setup? No servers, no API keys, no build step.

If your task spans architecture + implementation + testing + docs + integration, this is exactly what Swarm Command is built for.


πŸ€” What Is This?

Swarm Command is a multi-model swarm orchestration skill for the Copilot CLI that launches 50 to 250+ AI agents across 15 different models to solve complex tasks through hierarchical fan-out, cross-family review, and consensus-gated synthesis.

Give it a task β€” architecture, refactoring, testing, docs, or integration β€” and it decomposes the mission into domains, dispatches Commanders, Squad Leads, and Workers, validates outputs against sealed acceptance criteria, and synthesizes a final answer from collective intelligence instead of single-model intuition.

πŸ’¬ The Problem

One model gives you one perspective.

For small tasks, that's perfect. For high-stakes tasks, it's fragile:

  • the model may miss cross-cutting risks,
  • the task may exceed one context window,
  • the output may sound confident without being complete,
  • and you have no independent check that the answer actually satisfies the mission.

Swarm Command solves that by turning one request into a structured swarm process: split, parallelize, review, validate, converge.


πŸ₯Š Swarm Command vs. Stampede vs. Havoc

These systems are complementary β€” not competitors.

If you need to... Use Why
Solve one complex task with layered consensus inside your current Copilot CLI session Swarm Command Best when you want decomposition, cross-model review, shadow validation, and one synthesized answer
Run parallel coding workstreams across terminals or branches Stampede Best when the goal is execution throughput across independent task lanes
Run a many-model tournament to pressure-test ideas and rank options Havoc Hackathon Best when you want competitive ideation, elimination rounds, and judged synthesis

Rule of thumb:


⚑ What Makes It Different

  • 🐝 True swarm β€” 50 to 250+ agents, not 3–5
  • πŸ—οΈ 5-layer hierarchy β€” Nexus β†’ Commander β†’ Squad Lead β†’ Worker β†’ Reviewer
  • πŸ”€ Cross-model diversity β€” Claude + GPT families mixed within every pod
  • πŸ—³οΈ Consensus scoring β€” 4-stage gate-then-rank with CONSENSUS / MAJORITY / CONFLICT tiers
  • πŸ‘» Shadow Score β€” Shadow Score Spec L2 conformance. Sealed acceptance criteria generated before commanders execute, validated after, hardened on failure.
  • πŸ›‘οΈ Depth Guard β€” 5 laws + 3-layer enforcement prevent runaway agent spawning
  • ⚑ Circuit breaker β€” 3-state FSM with 5-level recovery escalation
  • πŸ“‰ Parallel by design β€” agents execute concurrently with hierarchical fan-out and pipeline overlap
  • πŸ’° Cost-controlled β€” 1024:1 token compression, wave deployment, hard cost ceilings, and cheap workers
  • πŸ“¦ Zero infrastructure β€” no servers, no API keys, no build step

πŸ’° Built for Cost Control

Running 250+ agents sounds expensive. It isn't β€” because every layer is engineered to minimize spend.

Token Compression (1024:1)

Context shrinks at every layer. The Nexus holds 128K tokens; by the time instructions reach a worker, they're 128 tokens. Parents strip rationale, narrow file scope, and tighten constraints so children only receive the bytes they need.

Nexus       128K tokens ──► 4K task brief
Commander    64K tokens ──► 2K context capsule
Squad Lead   32K tokens ──► 512 shard
Worker        8K tokens ──► 128 micro-brief

Circuit Breakers

A three-state FSM (CLOSED β†’ OPEN β†’ HALF-OPEN) monitors every layer. If too many agents fail (50–60% threshold), the breaker trips β€” no new agents spawn, costs stop climbing, and a recovery probe tests before the swarm resumes.

5-level recovery escalation: Retry β†’ Simplify β†’ Model Swap β†’ Scope Reduce β†’ Graceful Degrade.

Wave Deployment (Canary β†’ Probe β†’ Remainder)

Agents don't all launch at once. Each pod deploys in three waves with health gates between them:

  1. Wave 1 (Canary) β€” 1 agent verifies the task is feasible
  2. Wave 2 (Probe) β€” 3 agents test for rate limits and bulk viability
  3. Wave 3 (Remainder) β€” full pod only if gates pass

If the canary fails, the full pod never deploys. One cheap test prevents many expensive failures.

Six Resource Guards

Guard What it does
Timeout cascade 90s β†’ 60s β†’ 40s β†’ 30s per layer β€” children always finish before parents
Token ceiling 128K / 64K / 32K / 8K per layer
Output size cap 4K / 1K / 512 / 256 tokens per layer
Retry budget Workers: 0 retries. Squad Leads: 1 retry.
Concurrent agent cap Max 50 agents launching simultaneously
Cost ceiling $5 / $10 / $20 hard cap β€” kills all agents if breached

Cost by Scale

Scale Agents Typical Cost Hard Cap Wall-Clock
SS-50 ~36-52 $2.50 $5 ~30s
SS-100 ~89 $5.50 $10 ~45s
SS-250 ~316 $10 $20 ~65–90s

Why It's Cheap

  • Workers are the cheapest models β€” Haiku and GPT-Mini at L3, 10Γ— cheaper than Opus
  • Expensive reasoning stays at the top β€” Opus and Sonnet only at Commander/Nexus level
  • Context compresses monotonically β€” each layer receives a fraction of its parent's tokens
  • Failed work stops early β€” circuit breakers and canary gates prevent runaway spend

🧠 30-Second Architecture

Before the full diagrams, here's the mental model:

  1. Nexus reads the mission and splits it into domains.
  2. Commanders own each domain and dispatch sub-work.
  3. Workers do tiny atomic tasks in parallel.
  4. Reviewers + Shadow Score decide what survives into the final answer.
You ask one question
        ↓
Nexus decomposes the mission
        ↓
Commanders split by domain
        ↓
Workers execute atomic tasks in parallel
        ↓
Reviewers score + Shadow Score validates
        ↓
Nexus emits one final bundle

If you want the visual deep dive, jump to docs/architecture.md or docs/architecture-diagrams.md.


πŸ—οΈ How It Works

                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    L0    β”‚     NEXUS (1)    β”‚  claude-opus-4.6
                          β”‚  128K ctx budget β”‚  Task decomposition + final synthesis
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                    β”‚                     β”‚
        β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
  L1    β”‚ CMD-ARCH   β”‚      β”‚ CMD-IMPL   β”‚  ...   β”‚ CMD-INTG   β”‚  Γ— 5 Commanders
        β”‚ 64K ctx    β”‚      β”‚ 64K ctx    β”‚        β”‚ 64K ctx    β”‚  Domain specialists
        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
              β”‚                    β”‚                     β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚                     β”‚
     β”‚        β”‚        β”‚          β”‚                     β”‚
  β”Œβ”€β”€β”΄β”€β”€β” β”Œβ”€β”€β”΄β”€β”€β” β”Œβ”€β”€β”΄β”€β”€β”
  L2  β”‚SQ-1β”‚ β”‚SQ-2β”‚ ... β”‚SQ-10β”‚   Γ— 10 per Commander = 50 Squad Leads
      β”‚32K β”‚ β”‚32K β”‚     β”‚32K β”‚    Micro-task decomposition + canary deploy
      β””β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”˜   β””β”€β”€β”¬β”€β”€β”˜
         β”‚        β”‚          β”‚
      β”Œβ”€β”€β”΄β”€β”€β” β”Œβ”€β”€β”΄β”€β”€β”   β”Œβ”€β”€β”΄β”€β”€β”
  L3  β”‚WΓ—5  β”‚ β”‚WΓ—5  β”‚   β”‚WΓ—5  β”‚  Γ— 5 per Squad Lead = 250 Workers
      β”‚ 8K  β”‚ β”‚ 8K  β”‚   β”‚ 8K  β”‚  Atomic execution (LEAF β€” no spawning)
      β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”˜

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              L4    β”‚ REVIEWERSΓ—10 β”‚  Cross-review mesh (pipeline overlap)
                    β”‚    16K ctx   β”‚  4-axis sealed scoring + consensus tiers
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

              + SHADOW SCORING (sealed acceptance criteria, Shadow Score Spec L2)

Time-Flow Architecture

T+0s     T+2s       T+5s         T+12s       T+45s      T+65s    T+80s   T+90s
  β”‚        β”‚          β”‚             β”‚           β”‚          β”‚        β”‚       β”‚
  β–Ό        β–Ό          β–Ό             β–Ό           β–Ό          β–Ό        β–Ό       β–Ό
β”Œβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”
β”‚NEXUSβ”‚β†’ β”‚CMDs  β”‚β†’ β”‚SQUAD    β”‚β†’ β”‚WORKERS   β”‚ β”‚REVIEW  β”‚ β”‚MERGE  β”‚ β”‚VOTEβ”‚ β”‚EMITβ”‚
β”‚BOOT β”‚  β”‚SPAWN β”‚  β”‚LEADS    β”‚  β”‚EXECUTE   β”‚ β”‚MESH    β”‚ β”‚RESULTSβ”‚ β”‚    β”‚ β”‚    β”‚
β”‚     β”‚  β”‚      β”‚  β”‚+ CANARY β”‚  β”‚(parallel)β”‚ β”‚(overlapβ”‚ β”‚       β”‚ β”‚    β”‚ β”‚    β”‚
β”‚     β”‚  β”‚      β”‚  β”‚VERIFY   β”‚  β”‚          β”‚ β”‚start)  β”‚ β”‚       β”‚ β”‚    β”‚ β”‚    β”‚
β””β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜
  2s       3s         7s           33s          20s        15s      10s    5s

Signal Flow β€” Token Compression

           CONTEXT DOWN (shrinking)              RESULTS UP (compressing)
           ========================              ========================

  L0  Full Task Brief    ─── 4K tokens ───►  Final Report     ◄── 4K tokens
                 β”‚                                    β–²
  L1  Context Capsule    ─── 2K tokens ───►  Bundle           ◄── 1K tokens
                 β”‚                                    β–²
  L2  Shard              ─── 512 tokens ──►  Atom Set         ◄── 512 tokens
                 β”‚                                    β–²
  L3  Micro-Brief        ─── 128 tokens ──►  Atom             ◄── 256 tokens
                 β”‚                                    β–²
  L4  Review Capsule     ─── 1K tokens ───►  Score Card       ◄── 512 tokens

πŸ“Š Scaling Variants

Scale Agents Commanders Workers Reviewers Best For Wall-Clock
SS-50 ~36-52 2-3 30-45 3 Fast bounded tasks ~30s
SS-100 ~89 5 75 8 Multi-file features and reviews ~45s
SS-250 ~316 5 250 10 Repo-wide or high-stakes work ~65–90s

10-Second Decision Tree

Do you need a fast second opinion on 1–2 files?
β†’ SS-50

Do you need a serious answer for a multi-file feature or subsystem?
β†’ SS-100

Do you need repo-wide coverage, compliance-grade review, or maximum consensus?
β†’ SS-250

Default is SS-100. Say swarm command ss-250 for full deployment or swarm command ss-50 for quick tasks.

See docs/scaling.md for cost breakdowns, chooser guidance, and a deeper decision matrix.


🎯 Use Cases

Curated highlights β€” see docs/use-cases.md for the full gallery.

SS-50 β€” Fast Expert Panels (~30s)

πŸ”₯ Stack Trace Whisperer

swarm command ss-50 "Diagnose this error β€” 3 most likely root causes with fixes: [paste error]"

Three fast expert panels race on runtime, dependency, and logic hypotheses. You get ranked diagnoses, not a single guess.

πŸ” Explain Like I Own It

swarm command ss-50 "I just inherited this codebase. Explain src/core/ β€” what does each piece do, where are the landmines?"

Great for onboarding: architecture map, event flow, and hidden footguns in one brief.

⚑ Performance Profiler's Shortcut

swarm command ss-50 "Find the performance bottlenecks in this file with optimized versions: [paste hot-path file]"

Ideal when you need a prioritized hit list before opening a profiler.

SS-100 β€” Full Swarm, Default Scale (~45s)

πŸ” Zero-Downtime Auth Rewrite

swarm command "Migrate our session auth to JWT + refresh tokens across API, web app, DB, and tests"

Architecture, implementation, testing, docs, and rollout risk all get separate ownership before synthesis.

πŸ—οΈ Legacy Service Extraction

swarm command "Extract the billing module from our monolith into a service with minimal downtime"

Produces migration phases, interface boundaries, contract tests, and rollback paths.

πŸ“± Offline Sync Feature

swarm command "Design offline-first sync for our field app: local cache, conflict resolution, API changes, UX, and tests"

Covers data model, UX states, conflict semantics, and integration testing in parallel.

SS-250 β€” Maximum Intelligence (~65–90s)

πŸ›‘οΈ Zero-Day Security Sweep

swarm command ss-250 "Full security audit: every file, every dependency, every injection surface β€” CVSS-scored vulnerability report"

Best for broad-surface analysis where missing even one category matters.

βš–οΈ Compliance Fortress

swarm command ss-250 "Audit for GDPR, HIPAA, SOC2, PCI-DSS compliance β€” every gap, every control, remediation tickets"

Turns a giant policy problem into parallel control checks with one synthesized risk summary.

πŸ—ΊοΈ Living Runbook Generator

swarm command ss-250 "Read every service, every pipeline, every config β€” generate the complete operations manual"

Excellent when tribal knowledge has to become documentation fast.

❌ When NOT to Swarm

  • "What's the CLI flag for X?" β†’ Ask a single agent
  • Rename one variable β†’ Manual edit or single agent
  • Prod is down and seconds matter β†’ Follow the human runbook first
  • Writing a single-voice email β†’ One persona is better than a committee
  • Step-through debugging β†’ Sequential work beats consensus here

πŸ“¦ Install

Instant Install (no clone needed) ⚑

mkdir -p ~/.copilot/skills/swarm-command ~/.copilot/agents && \
  curl -sL https://raw.githubusercontent.com/DUBSOpenHub/swarm-command/main/skills/swarm-command/SKILL.md \
    -o ~/.copilot/skills/swarm-command/SKILL.md && \
  curl -sL https://raw.githubusercontent.com/DUBSOpenHub/swarm-command/main/agents/swarm-command.agent.md \
    -o ~/.copilot/agents/swarm-command.agent.md && \
  echo "βœ… Swarm Command installed β€” open Copilot CLI and type: swarm command"

Verify integrity (optional):

shasum -a 256 ~/.copilot/skills/swarm-command/SKILL.md
shasum -a 256 ~/.copilot/agents/swarm-command.agent.md

πŸ’‘ Security note: We recommend inspecting quickstart.sh before piping to bash. You can also use the manual install above instead.

Clone & Explore

git clone https://github.com/DUBSOpenHub/swarm-command.git
cd swarm-command
chmod +x quickstart.sh && ./quickstart.sh

🧭 Reading Order / Learning Path

If you're new, read in this order:

  1. This README β€” what it is, when to use it, and how to run it
  2. docs/learning-path.md β€” beginner, operator, and architect reading tracks
  3. docs/architecture.md β€” the conceptual system model
  4. docs/scaling.md β€” which scale to choose and what it costs
  5. docs/use-cases.md β€” vivid prompts and expected outcomes
  6. docs/consensus.md + docs/shadow-scoring.md β€” the deep mechanics

Fast paths

  • I just want to try it: README β†’ install β†’ run swarm command
  • I want to operate it well: README β†’ learning path β†’ scaling β†’ use cases
  • I want to understand the design: README β†’ architecture β†’ consensus β†’ shadow scoring

❓ FAQ

Do I need API keys or infrastructure?

No. Swarm Command runs through your active Copilot subscription. No separate servers, queues, or key management required.

When should I use SS-50, SS-100, or SS-250?

Use SS-50 for bounded, fast tasks. Use SS-100 for most real software work. Use SS-250 when the task is repo-wide, high-stakes, or needs maximum coverage and consensus.

Personality Modes

Append a personality mode after the scale to adjust how the swarm operates:

swarm command ss-100 thorough "audit auth module"
swarm command ss-250 fast "quick scan of README"
Mode Workers Timeout Models Retry Best For
balanced (default) 5 per squad 1.0Γ— mixed 1 Most tasks
thorough 5 per squad 1.5Γ— opus/sonnet 2 High-stakes, complex analysis
fast 3 per squad 0.6Γ— haiku only 0 Quick iteration, cost-sensitive
creative 4 per squad 1.0Γ— max diversity 1 Brainstorming, novel problems
cautious 5 per squad 1.2Γ— sonnet 2 Ambiguous tasks, high conflict risk

Why mix Claude and GPT models?

Because diversity helps. Different model families catch different failure modes. Swarm Command intentionally mixes them so agreement means more than self-consistency.

What happens when agents disagree?

Disagreement is preserved, scored, and escalated. Squad Leads and Commanders mark results as CONSENSUS, MAJORITY, CONFLICT, or UNIQUE, then Nexus arbitrates the unresolved pieces.

What is Shadow Score in one sentence?

It is a hidden acceptance test: criteria are generated before execution, kept sealed from the swarm, then used to validate outputs afterward.

Will this write code automatically?

It can produce plans, analyses, patches, documentation, tests, and rollout guidance depending on how you invoke it β€” but the point is not blind automation. The point is reviewable, consensus-backed output.

When should I avoid using it?

Avoid it for tiny edits, urgent incident response where every second matters, or tasks that need one strong voice rather than many perspectives.


πŸ› οΈ How It Was Built

Swarm Command came out of a simple question: what if one Copilot CLI session could behave less like one assistant and more like a disciplined organization?

The design evolved from SwarmSpeed 250 experiments into a layered system with:

  • a single Nexus orchestrator,
  • domain-owning Commanders,
  • decomposing Squad Leads,
  • leaf-node Workers,
  • and independent Reviewers.

The turning point was a self-analysis run later documented in docs/shadow-scoring.md: sealed judges rated a design highly even though it contained critical arithmetic errors. That exposed a core truth of multi-agent systems: review alone is not validation.

That failure drove the big ideas that now define this repo:

  • Shadow scoring so hidden criteria can catch what the swarm forgot to optimize for
  • Depth Guard so recursion never turns into agent explosion
  • Token compression so higher-level intent survives while lower layers stay cheap
  • Cross-family review so agreement means more than β€œthe same model said it twice”

In other words: Swarm Command is not just a big swarm. It is a swarm that learned from its own failure modes.


πŸ“‹ Example Output

See what a completed swarm run looks like β†’ Example Output

🐝 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   S W A R M   C O M P L E T E
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Results Summary
- Domains completed: 5/5
- Consensus tier: CONSENSUS (4) Β· MAJORITY (1)
- Overall confidence: 0.77
- Agents deployed: 89
- Wall-clock time: 72s
- Shadow Score: 20.0% 🟑 Moderate (8 pass · 2 fail)

πŸ‘» Shadow Scoring

Swarm Command implements Shadow Score Spec L2 conformance β€” sealed acceptance criteria generated before commanders execute, validated after, hardened on failure.

Formula: Shadow Score = (sealed_failures / sealed_total) Γ— 100

Shadow Score Level Action
0% βœ… Perfect All sealed criteria passed
1–15% 🟒 Minor Proceed normally
16–30% 🟑 Moderate Attach Gap Report, warn
31–50% 🟠 Significant Quarantine bundle, hardening cycle
> 50% πŸ”΄ Critical Reject bundle from synthesis

Sealed-envelope protocol:

  1. Phase 1.5 β€” Nexus generates sealed acceptance criteria from the task
  2. Phases 2–5 β€” Commanders execute without seeing those criteria
  3. Phase 6 β€” Validate outputs, compute Shadow Score, produce Gap Report
  4. Hardening β€” If score > 15%, share failure messages only for one fix cycle

See docs/shadow-scoring.md for the full protocol.


πŸ—³οΈ Consensus Algorithm

A 4-stage consensus pipeline merges the best work from hundreds of agents:

  1. Worker Self-Score β€” Each worker emits confidence + self-score with its atom
  2. Squad Lead Local Merge β€” Groups atoms by sub-task, classifies as CONSENSUS / MAJORITY / CONFLICT
  3. Commander Domain Merge β€” Trimmed mean across squads, applies the consensus formula
  4. Nexus Cross-Domain Synthesis β€” Median-of-3 judging and final arbitration

Consensus formula:

score = 0.40 Γ— confidence + 0.30 Γ— evidence + 0.15 Γ— scope + 0.15 Γ— coverage βˆ’ min(0.30, conflict_rate Γ— 0.30)
Tier Condition Action
CONSENSUS β‰₯ 70% agreement Auto-accept
MAJORITY β‰₯ 50% agreement Accept with dissent note
CONFLICT < 50% agreement Nexus arbitration
UNIQUE No overlap Keep if evidence β‰₯ 7/10

See docs/consensus.md for the full mechanics.


βš™οΈ Configuration

All tunables live in config.yml. Key settings:

consensus:
  threshold_consensus: 0.70
  threshold_majority: 0.50

depth_guard:
  max_spawn_depth: 3
  max_workers_per_squad_lead: 5

circuit_breaker:
  timeout_cascade: [90, 60, 40, 30]

shadow_scoring:
  enabled: true
  spec_version: "1.0.0"
  conformance_level: "L2"
  sealed_criteria_count: 10  # max; per-scale: SS-50=6, SS-100=8, SS-250=10
  hardening:
    enabled: true  # SS-50 overrides to disabled
    threshold: 15

See docs/scaling.md for full scaling configuration and cost estimates.


πŸ€– Models Used

Role Models
Nexus claude-opus-4.6
Commanders (pool: 9) claude-opus-4.6, claude-opus-4.5, claude-opus-4.6-1m, claude-sonnet-4.6, claude-sonnet-4.5, claude-sonnet-4, gpt-5.4, gpt-5.2, gpt-5.1
Squad Leads (SS-250 only) claude-haiku-4.5, gpt-5.4-mini
Workers (pool: 6) claude-haiku-4.5, gpt-5.4-mini, gpt-5-mini, gpt-4.1, gpt-5.3-codex, gpt-5.2-codex
Reviewers (7 pairs) claude-opus-4.6↔gpt-5.4, claude-opus-4.5↔gpt-5.2, claude-opus-4.6-1m↔gpt-5.1, claude-sonnet-4.6↔gpt-5.3-codex, claude-sonnet-4.5↔gpt-5.2-codex, claude-sonnet-4↔gpt-5.4-mini, claude-haiku-4.5↔gpt-5-mini

πŸ“ Repository Structure

swarm-command/
β”œβ”€β”€ README.md                           # Overview, install, comparison, FAQ
β”œβ”€β”€ AGENTS.md                           # Agent/skill descriptions
β”œβ”€β”€ CONTRIBUTING.md                     # Contribution guidelines
β”œβ”€β”€ catalog.yml                         # Skill metadata
β”œβ”€β”€ config.yml                          # All tunables
β”œβ”€β”€ LICENSE                             # MIT
β”œβ”€β”€ SECURITY.md                         # Security policy
β”œβ”€β”€ quickstart.sh                       # One-line installer
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ copilot-instructions.md         # AI agent instructions for this repo
β”‚   β”œβ”€β”€ workflows/ci.yml               # CI: YAML lint + SKILL.md sync check
β”‚   └── skills/swarm-command/SKILL.md   # Skill discovery path
β”œβ”€β”€ agents/
β”‚   └── swarm-command.agent.md          # Standalone agent version
β”œβ”€β”€ skills/swarm-command/
β”‚   └── SKILL.md                        # Core skill
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ commander.md                    # Commander prompt template
β”‚   β”œβ”€β”€ worker.md                       # Worker prompt template
β”‚   β”œβ”€β”€ reviewer.md                     # Cross-reviewer prompt template
β”‚   └── squad-lead.md                   # Squad Lead prompt template
β”œβ”€β”€ protocols/
β”‚   β”œβ”€β”€ depth-guard.md                  # 5 Laws + 3-layer enforcement
β”‚   β”œβ”€β”€ circuit-breaker.md              # 3-state FSM + 5-level recovery
β”‚   β”œβ”€β”€ context-capsule.md              # JSON schemas for data structures
β”‚   └── meta-reviewer.md               # Reviewer quality gate protocol
└── docs/
    β”œβ”€β”€ architecture.md                 # Architecture overview
    β”œβ”€β”€ architecture-diagrams.md        # Mermaid diagrams
    β”œβ”€β”€ consensus.md                    # Consensus algorithm deep dive
    β”œβ”€β”€ example-output.md               # Sample completed swarm run output
    β”œβ”€β”€ learning-path.md                # Recommended reading order
    β”œβ”€β”€ scaling.md                      # Scale chooser + cost estimates
    β”œβ”€β”€ shadow-scoring.md               # Shadow scoring protocol
    └── use-cases.md                    # Expanded prompt gallery

πŸ“„ License

MIT β€” use it, fork it, build on it.


πŸ›‘οΈ Spec Conformance

This project implements Shadow Score Spec L2 β€” sealed acceptance criteria generated before execution, validated after, hardened on failure.


πŸ™ Created with πŸ’œ by @DUBSOpenHub with the GitHub Copilot CLI.

About

🐝 Swarm Command β€” Multi-model consensus swarm orchestrator for Copilot CLI. Instantly launch up to 250 agents across 16 models with shadow scoring.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors