Your software house ships PRs while you sleep.
You write the PRD. Pantheon ships the PR.
git clone https://github.com/5percentdrops/pantheon.git && cd pantheon && bash scripts/one_click_install.sh -y --setup-keysThat's it. 32 AI engineers wake up. Each has a name, a personality, a memory, and a job.
Problem: You're a solo dev with the ambition of a 30-person studio. You can't hire. You can't clone yourself. You can prompt an LLM β once, for one task, with no memory.
Pantheon: A persistent 33-agent software studio that runs locally, remembers everything, reviews its own code, and merges its own PRs. Each agent is a distinct identity with its own home, soul, memory, and skill library. They report up an org chart. They escalate when stuck. They learn from their mistakes.
It's not a wrapper. It's a company.
| Before Pantheon | With Pantheon | |
|---|---|---|
| Writing a feature | You + ChatGPT, one tab, no memory | PRD β Arthur routes β Marcus designs β Jack codes β Clara reviews β Cody re-reviews β merge |
| Code review | "Looks fine to me" | Dual-pass Claude Opus 4.7 + GPT-5.5 Codex, with conditional escalation to specialists |
| Memory | Vanishes when tab closes | Per-agent SOUL.md + MEMORY.md + skill library, persistent forever |
| Learning | You re-prompt the same fix daily | Cody writes a skill the first time, applies it the next 1000 |
| Scaling | You hire engineers | You add API keys |
33 named agents, each with a job, a model, and a will.
π€ YOU (Board / Final Approval)
β
π― Arthur β Project Manager
β
βββββββββββββββββΌββββββββββββββββ
β β β
π Architecture π¨ Build π§ͺ Quality
Marcus, Priya Jack, Ben... Nadia, Stone
β
π PR opened
β
π Clara (Opus 4.7)
β
π Cody (GPT-5.5 Codex)
β
π Safiya Β· π Priya Β· π§ͺ Nadia
β
π₯ Maxwell (Opus Max) β if Cody fails Γ2
β
β
Arthur merges
β
π Winston archives
| Role | Agents |
|---|---|
| π― Head | Arthur (PM, GPT-5 mini) |
| π Architecture | Marcus, Priya (Opus 4.7) |
| π¨ Senior build | Marcus Β· Magnus (Gemini 3.1 Pro) Β· Maxwell (Opus 4.7 Max) |
| π· Engineers | Jack, Ben, Ivan, Theo, Leo, Ellie, Grant (DeepSeek V4 Pro) |
| π¨ Specialists | Felix (Pine Script), Henrik, Oscar, Mira, Sonia, Viktor, Dominic, Nathan, Vera, Graham |
| π PR review (dual) | Clara (Claude Opus 4.7) β Cody (GPT-5.5 Codex) |
| π‘ Quality & security | Nadia (QA), Safiya (Security), Stone (perf), Adrian (release) |
| π Knowledge | Winston (Claude 3.5 Haiku, wiki archive) |
| π Domain | Chloe, Dante (Kimi K2), Elena (Sonnet 4.6) |
| π Dormant | Owen (waiting on NotebookLM API) |
Model spread: Anthropic 15 Β· DeepSeek 7 Β· OpenAI 5 Β· Google 3 Β· Moonshot 2. Every agent runs on Hermes. One harness, 33 identities, total isolation.
~/.hermes-marcus/
βββ SOUL.md β who they are
βββ MEMORY.md β what they've learned (grows forever)
βββ USER.md β who they report to (you)
βββ skills/ β skills they wrote themselves
βββ sessions/ β FTS5-searchable session history
Fire up the same machine next year. Marcus remembers the architecture decision he pushed back on in March. Clara still knows the bug pattern she flagged in your auth flow. Cody still has the skill he wrote the first time he saw a race condition.
Agents write skills to themselves after solving a problem. The next agent to hit the same pattern reads it and skips the discovery phase. Your studio gets smarter every week, not slower.
At 03:00 UTC every agent runs a Dreaming pass in its own home: 7 days of sessions reviewed, duplicate skills sha256-dedup'd, MEMORY.md consolidated. SOUL.md / USER.md / config.yaml are immutable. No cross-agent reads. Compounding quality gains while you sleep.
Marcus's SDD doesn't reach Jack's TDD block until Nadia (Senior QA, Opus 4.7 XHigh) signs it off against the PRD. Max 2 return cycles before Arthur escalates. Catches PRD-to-SDD drift at design time, not at PR review.
Clara grades implementations against a per-ticket outcome.schema.json rubric β must_have criteria, must_not anti-criteria, score floor 0.85 β with cited file:line evidence per criterion. Jack auto-iterates against the rubric before Cody ever sees the PR. End of "looks fine to me."
When Marcus emits β₯2 tickets with no inter-dependency, Arthur dispatches across 4 parallel DeepSeek engineers (Jack/Ben/Ivan/Theo/Leo/Ellie/Grant) instead of serialising. Serial fallback on git-index contention. Opt-in per project after the smoke ramp.
Arthur-owned cron at */30 * * * * sums per-agent token-proxy bytes, emits WARN at 80% / CRIT at 95% of daily cap. CRIT lands in Arthur's MEMORY.md so he reads it on the next dispatch. Zero LLM calls, pure file IO.
When Jack burns 21 attempts and hands to Marcus, the handoff is a strict engineer_escalation_packet.v1 JSON β RTK-compressed trace, red test IDs, blocked-on enum, git SHA reference. Raw conversational text is rejected. Opus 4.7 stops paying input-token tax on transcript parsing.
V8.6 dreaming kept lessons per-agent. V8.8 wires Winston to scrape every home at 04:00 UTC, dedup by sha256, and write a single workspace/wiki/lessons_learned.md that engineers pre-read before TDD. Jack stops relearning what Marcus already knew. Live: 42 lessons extracted on first run.
Maxwell (Opus 4.7 Max) is the most expensive agent. His overrides used to auto-merge. Now Cody (GPT-5.5) re-grades against the same outcome rubric Clara used on Jack. Max 2 iterations β Magnus (architecture review) if the override still fails. No more silent acceptance of confident-wrong senior fixes.
V8.7 wired token-bloat alerts. V8.8 wired handoff-schema rejections. V8.9 closes the two remaining article failure modes β quality degradation and redundant work β and rolls all four into workspace/07_Finalization/metrics_dashboard.md. Arthur cron at */15 reads every alert sink across V8.6βV8.8 and emits a watch list with explicit triggers.
Per-ticket outcomes (V8.7) grade Jack's PRs. System outcomes (V8.9) grade the company. Weekly scorecard checks pipeline completion β₯ 90%, avg iterations β€ 2, escalation rate β€ 15%, zero CRIT budget alerts, β₯ 20% multi-agent lesson reinforcement. Verdict escalate_to_board lands in Arthur's MEMORY.md so the human board sees it on next dispatch.
Winston Sunday-night scan flags two agents doing the same job β by role Jaccard, by handoff overlap, by shared seed-skill, by shared model+harness. Advisory only; never blocks the pipeline. First scan on the 33-agent roster surfaced 125 findings β concrete triage signal for tightening role scopes.
Before the Phase 1 triad, run Arthur + Marcus on a trivial add(a, b) PRD. Verifies the handoff contract surface (prd β paperclip_issue β prd_to_sdd_pipeline) cleanly before any engineer risk. Must pass twice with different toy PRDs before Jack ever wakes up.
Every pipeline now declares output_budget and every stage declares max_output_tokens (or _bytes for emitters). Marcus's SDD capped at 12 000. Jack's implementation capped at 16 000 (largest legitimate stage). Maxwell override capped at 16 000 β most expensive lane gets explicit cap. Closes article Mistake 5 (token bloat).
V8.6 made the SDDβTDD handoff bypass-proof. V8.10 extends the invariant across all 9 pipelines: every non-first stage declares input_contract or input_event. Cody cannot accept an SDD where it expects an implementation report. Wrong-handoff routing fails at schema validation, not after wasting Opus tokens.
sdd.schema.json and test_plan.schema.json now ship as $ref aliases (in both schemas/ and contracts/) to the canonical prd_to_sdd_pipeline.schema.json and task_tdd_block.schema.json. New users grepping the article-derived names get a hit. One canonical shape, two findable paths.
examples/weekly_market_intelligence.md β a 9-stage trace of a real PRD ("weekly competitor intel report") with per-stage projected token + wall-time, sequential vs fan-out variants, sample dashboard view. Numbers are projections from V8.10 caps; replace with empirical data after your first run.
Paperclip β company / control plane (1 instance)
hermes_local β external Paperclip adapter (npm: hermes-paperclip-adapter)
HERMES_HOME β per-agent identity root
hermes β runtime invoked per task
LLMs β whatever you put your money on
No vendor lock. No hidden state. No SaaS. Your machine, your models, your keys, your agents.
node --version # β₯ 20
python3 --version # β₯ 3.11
npm install -g paperclipai # β₯ 2026.513.0
# Install hermes per https://github.com/NousResearch/hermes-agentgit clone https://github.com/5percentdrops/pantheon.git
cd pantheonbash scripts/one_click_install.sh -y --setup-keysThe installer will:
- β Validate (15+ checks)
- βοΈ Convert to
agentcompanies/v1package - π Bootstrap 32 per-agent Hermes homes
- π Register
hermes_localadapter - π Securely prompt for API keys (hidden input,
chmod 600, zero network) - π Import the Pantheon into Paperclip
Open Paperclip β Pantheon β Arthur
Send: "Build a CLI tool that counts unique words in a file."
Watch 6 agents collaborate. Get a merged PR.
- π
setup_api_keys.shβumask 077Β·chmod 600Β·read -s(no echo) Β· atomic writes Β· zero network - π« No keys,
.env, or PEM files in repo (.gitignoreenforces) - π·
production trading keys forbidden in general agentsβ policy-enforced - π§± Per-agent isolation:
--per-agentmode for env separation - π Dual PR review + conditional security escalation to Safiya
paperclipai adapters list | grep hermes_local # β
adapter registered
paperclipai company list # β
Pantheon present
ls -d ~/.hermes-* | wc -l # β
32 (Owen skipped)
python3 scripts/validate_hermes_local_package.py # β
32 routed| OS | Status |
|---|---|
| π§ Linux | β |
| π macOS | β |
| πͺ Windows (WSL2) | β |
| πͺ Windows native | β β use WSL (full reason in README_INSTALL.md) |
bash scripts/one_click_install.sh -y --validate-only # validators only
bash scripts/one_click_install.sh -y --convert-only # generate package
bash scripts/one_click_install.sh -y --no-bootstrap # skip 32-home step
bash scripts/one_click_install.sh -y --skip-adapter-install # skip adapter register
bash scripts/one_click_install.sh -y --no-paperclip # skip company import
bash scripts/one_click_install.sh -y --setup-keys # add secure key prompt
bash scripts/one_click_install.sh -y --no-dreaming # V8.6: skip nightly Dreaming cronQ: Is this AGI? No. It's a coordination layer over the LLMs you already pay for, with persistent identity per agent.
Q: How much will it cost to run? Token usage scales with the work, not the agent count. An idle agent burns nothing. Heavy day β a few dollars of Anthropic + OpenAI traffic.
Q: Can I add my own agent?
Yes. Drop a .md file in the company tree, give them a model, run the converter. They wake up next install.
Q: Why "Pantheon"? 33 named entities with distinct powers, organised under one head, reporting to a higher authority. Sound familiar?
Q: Can the agents see my code?
Only the code you give them. Each agent runs in its own ~/.hermes-<slug>/ sandbox.
Q: What if an agent goes off the rails? Three layers stop it: per-agent budget caps, dual PR review (Clara + Cody), and human approval gates on merge / deploy / production trading rules.
README_INSTALL.mdβ full install guide, OS matrix, key setupSMOKE_SCALE.mdβ phased 3β33 agent ramp (don't fire all 33 on day one)examples/weekly_market_intelligence.mdβ concrete 9-stage walkthrough with token/time projectionsPATCH_NOTES_V8_10.mdβ Phase 0 + per-stage caps + bypass-proof contracts + schema aliases + example docPATCH_NOTES_V8_9.mdβ central observability (dashboard + system outcomes + redundant work)PATCH_NOTES_V8_8.mdβ escalation packet schema + cross-agent learning + Maxwell override gradingPATCH_NOTES_V8_7.mdβ outcome rubric + fan-out + budget watcher + CMA burstPATCH_NOTES_V8_6.mdβ mid-pipeline QA + per-agent DreamingPATCH_NOTES_V8_5.mdβ Hermes-as-harness rolloutROLLBACK_TO_V8_4.mdβ one-command revertdocs/PAPERCLIP_HERMES_CONTROL_PLANE_V8.mddocs/FULL_PANTHEON_ARTHUR_HEAD.mddocs/UNIVERSAL_ORGANISATION_ESCALATION_PATTERN.mddocs/FINAL_PANTHEON_MODEL_MAP.md
Pantheon does not install Paperclip, Hermes, OpenClaw, provider API keys, or production trading keys. It stages a company/org package. Bring your own runtime, your own keys, your own ambition.
MIT β see LICENSE.
π If Pantheon ships its first PR for you, drop a star.
Make AI work for you, not the other way around.