Skip to content

5percentdrops/pantheon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ› Pantheon

33 AI agents. One company. Zero humans in the loop.

Your software house ships PRs while you sleep.

License: MIT OS Models Install Stars

You write the PRD.   Pantheon ships the PR.

⚑ Install in 30 seconds

git clone https://github.com/5percentdrops/pantheon.git && cd pantheon && bash scripts/one_click_install.sh -y --setup-keys

That's it. 32 AI engineers wake up. Each has a name, a personality, a memory, and a job.


πŸ’₯ Why this exists

Problem: You're a solo dev with the ambition of a 30-person studio. You can't hire. You can't clone yourself. You can prompt an LLM β€” once, for one task, with no memory.

Pantheon: A persistent 33-agent software studio that runs locally, remembers everything, reviews its own code, and merges its own PRs. Each agent is a distinct identity with its own home, soul, memory, and skill library. They report up an org chart. They escalate when stuck. They learn from their mistakes.

It's not a wrapper. It's a company.


🎯 The 30-second pitch

Before Pantheon With Pantheon
Writing a feature You + ChatGPT, one tab, no memory PRD β†’ Arthur routes β†’ Marcus designs β†’ Jack codes β†’ Clara reviews β†’ Cody re-reviews β†’ merge
Code review "Looks fine to me" Dual-pass Claude Opus 4.7 + GPT-5.5 Codex, with conditional escalation to specialists
Memory Vanishes when tab closes Per-agent SOUL.md + MEMORY.md + skill library, persistent forever
Learning You re-prompt the same fix daily Cody writes a skill the first time, applies it the next 1000
Scaling You hire engineers You add API keys

πŸ› The Pantheon

33 named agents, each with a job, a model, and a will.

                            πŸ‘€ YOU (Board / Final Approval)
                                         ↓
                            🎯 Arthur β€” Project Manager
                                         ↓
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         ↓               ↓               ↓
                  πŸ— Architecture     πŸ”¨ Build      πŸ§ͺ Quality
                  Marcus, Priya     Jack, Ben...   Nadia, Stone
                                         ↓
                                    πŸ“œ PR opened
                                         ↓
                              πŸ‘€ Clara (Opus 4.7)
                                         ↓
                              πŸ‘€ Cody (GPT-5.5 Codex)
                                         ↓
                         πŸ” Safiya Β· πŸ› Priya Β· πŸ§ͺ Nadia
                                         ↓
                              πŸ”₯ Maxwell (Opus Max)  ← if Cody fails Γ—2
                                         ↓
                              βœ… Arthur merges
                                         ↓
                              πŸ“š Winston archives
Role Agents
🎯 Head Arthur (PM, GPT-5 mini)
πŸ— Architecture Marcus, Priya (Opus 4.7)
πŸ”¨ Senior build Marcus Β· Magnus (Gemini 3.1 Pro) Β· Maxwell (Opus 4.7 Max)
πŸ‘· Engineers Jack, Ben, Ivan, Theo, Leo, Ellie, Grant (DeepSeek V4 Pro)
🎨 Specialists Felix (Pine Script), Henrik, Oscar, Mira, Sonia, Viktor, Dominic, Nathan, Vera, Graham
πŸ‘€ PR review (dual) Clara (Claude Opus 4.7) β†’ Cody (GPT-5.5 Codex)
πŸ›‘ Quality & security Nadia (QA), Safiya (Security), Stone (perf), Adrian (release)
πŸ“š Knowledge Winston (Claude 3.5 Haiku, wiki archive)
🌐 Domain Chloe, Dante (Kimi K2), Elena (Sonnet 4.6)
πŸŒ’ Dormant Owen (waiting on NotebookLM API)

Model spread: Anthropic 15 Β· DeepSeek 7 Β· OpenAI 5 Β· Google 3 Β· Moonshot 2. Every agent runs on Hermes. One harness, 33 identities, total isolation.


🧠 What makes this different

πŸͺž Each agent has a soul

~/.hermes-marcus/
  β”œβ”€β”€ SOUL.md       ← who they are
  β”œβ”€β”€ MEMORY.md     ← what they've learned (grows forever)
  β”œβ”€β”€ USER.md       ← who they report to (you)
  β”œβ”€β”€ skills/       ← skills they wrote themselves
  └── sessions/     ← FTS5-searchable session history

Fire up the same machine next year. Marcus remembers the architecture decision he pushed back on in March. Clara still knows the bug pattern she flagged in your auth flow. Cody still has the skill he wrote the first time he saw a race condition.

♻️ Self-improving loop

Agents write skills to themselves after solving a problem. The next agent to hit the same pattern reads it and skips the discovery phase. Your studio gets smarter every week, not slower.

πŸŒ™ Nightly Dreaming (V8.6)

At 03:00 UTC every agent runs a Dreaming pass in its own home: 7 days of sessions reviewed, duplicate skills sha256-dedup'd, MEMORY.md consolidated. SOUL.md / USER.md / config.yaml are immutable. No cross-agent reads. Compounding quality gains while you sleep.

πŸ›‘ Mid-pipeline QA gate (V8.6)

Marcus's SDD doesn't reach Jack's TDD block until Nadia (Senior QA, Opus 4.7 XHigh) signs it off against the PRD. Max 2 return cycles before Arthur escalates. Catches PRD-to-SDD drift at design time, not at PR review.

πŸ“ Rubric-graded reviews (V8.7)

Clara grades implementations against a per-ticket outcome.schema.json rubric β€” must_have criteria, must_not anti-criteria, score floor 0.85 β€” with cited file:line evidence per criterion. Jack auto-iterates against the rubric before Cody ever sees the PR. End of "looks fine to me."

⚑ Fan-out engineer pool (V8.7)

When Marcus emits β‰₯2 tickets with no inter-dependency, Arthur dispatches across 4 parallel DeepSeek engineers (Jack/Ben/Ivan/Theo/Leo/Ellie/Grant) instead of serialising. Serial fallback on git-index contention. Opt-in per project after the smoke ramp.

πŸ’° Per-host budget watcher (V8.7)

Arthur-owned cron at */30 * * * * sums per-agent token-proxy bytes, emits WARN at 80% / CRIT at 95% of daily cap. CRIT lands in Arthur's MEMORY.md so he reads it on the next dispatch. Zero LLM calls, pure file IO.

πŸ“¦ Rigid escalation contract (V8.8)

When Jack burns 21 attempts and hands to Marcus, the handoff is a strict engineer_escalation_packet.v1 JSON β€” RTK-compressed trace, red test IDs, blocked-on enum, git SHA reference. Raw conversational text is rejected. Opus 4.7 stops paying input-token tax on transcript parsing.

🌐 Cross-agent learning (V8.8)

V8.6 dreaming kept lessons per-agent. V8.8 wires Winston to scrape every home at 04:00 UTC, dedup by sha256, and write a single workspace/wiki/lessons_learned.md that engineers pre-read before TDD. Jack stops relearning what Marcus already knew. Live: 42 lessons extracted on first run.

πŸ”₯ Maxwell override grading (V8.8)

Maxwell (Opus 4.7 Max) is the most expensive agent. His overrides used to auto-merge. Now Cody (GPT-5.5) re-grades against the same outcome rubric Clara used on Jack. Max 2 iterations β†’ Magnus (architecture review) if the override still fails. No more silent acceptance of confident-wrong senior fixes.

πŸ“Š Central observability dashboard (V8.9)

V8.7 wired token-bloat alerts. V8.8 wired handoff-schema rejections. V8.9 closes the two remaining article failure modes β€” quality degradation and redundant work β€” and rolls all four into workspace/07_Finalization/metrics_dashboard.md. Arthur cron at */15 reads every alert sink across V8.6–V8.8 and emits a watch list with explicit triggers.

πŸ› System-level Outcomes (V8.9)

Per-ticket outcomes (V8.7) grade Jack's PRs. System outcomes (V8.9) grade the company. Weekly scorecard checks pipeline completion β‰₯ 90%, avg iterations ≀ 2, escalation rate ≀ 15%, zero CRIT budget alerts, β‰₯ 20% multi-agent lesson reinforcement. Verdict escalate_to_board lands in Arthur's MEMORY.md so the human board sees it on next dispatch.

πŸ” Redundant-work detector (V8.9)

Winston Sunday-night scan flags two agents doing the same job β€” by role Jaccard, by handoff overlap, by shared seed-skill, by shared model+harness. Advisory only; never blocks the pipeline. First scan on the 33-agent roster surfaced 125 findings β€” concrete triage signal for tightening role scopes.

πŸͺœ 2-agent Phase 0 smoke (V8.10)

Before the Phase 1 triad, run Arthur + Marcus on a trivial add(a, b) PRD. Verifies the handoff contract surface (prd β†’ paperclip_issue β†’ prd_to_sdd_pipeline) cleanly before any engineer risk. Must pass twice with different toy PRDs before Jack ever wakes up.

πŸ“ Per-stage output caps (V8.10)

Every pipeline now declares output_budget and every stage declares max_output_tokens (or _bytes for emitters). Marcus's SDD capped at 12 000. Jack's implementation capped at 16 000 (largest legitimate stage). Maxwell override capped at 16 000 β€” most expensive lane gets explicit cap. Closes article Mistake 5 (token bloat).

πŸ” Bypass-proof every handoff (V8.10)

V8.6 made the SDD→TDD handoff bypass-proof. V8.10 extends the invariant across all 9 pipelines: every non-first stage declares input_contract or input_event. Cody cannot accept an SDD where it expects an implementation report. Wrong-handoff routing fails at schema validation, not after wasting Opus tokens.

🧭 Friendly schema aliases (V8.10)

sdd.schema.json and test_plan.schema.json now ship as $ref aliases (in both schemas/ and contracts/) to the canonical prd_to_sdd_pipeline.schema.json and task_tdd_block.schema.json. New users grepping the article-derived names get a hit. One canonical shape, two findable paths.

πŸ““ Concrete example walkthrough (V8.10)

examples/weekly_market_intelligence.md β€” a 9-stage trace of a real PRD ("weekly competitor intel report") with per-stage projected token + wall-time, sequential vs fan-out variants, sample dashboard view. Numbers are projections from V8.10 caps; replace with empirical data after your first run.

πŸŽ› Stack you control

Paperclip    β†’  company / control plane (1 instance)
hermes_local β†’  external Paperclip adapter (npm: hermes-paperclip-adapter)
HERMES_HOME  β†’  per-agent identity root
hermes       β†’  runtime invoked per task
LLMs         β†’  whatever you put your money on

No vendor lock. No hidden state. No SaaS. Your machine, your models, your keys, your agents.


πŸš€ Quick start

1. Prereqs (5 min, one time)

node --version          # β‰₯ 20
python3 --version       # β‰₯ 3.11
npm install -g paperclipai      # β‰₯ 2026.513.0
# Install hermes per https://github.com/NousResearch/hermes-agent

2. Pull Pantheon

git clone https://github.com/5percentdrops/pantheon.git
cd pantheon

3. Fire it up

bash scripts/one_click_install.sh -y --setup-keys

The installer will:

  1. βœ… Validate (15+ checks)
  2. βš™οΈ Convert to agentcompanies/v1 package
  3. 🏠 Bootstrap 32 per-agent Hermes homes
  4. πŸ”Œ Register hermes_local adapter
  5. πŸ”‘ Securely prompt for API keys (hidden input, chmod 600, zero network)
  6. πŸ› Import the Pantheon into Paperclip

4. Ship something

Open Paperclip β†’ Pantheon β†’ Arthur
Send: "Build a CLI tool that counts unique words in a file."
Watch 6 agents collaborate. Get a merged PR.

πŸ›‘ Security posture

  • πŸ”’ setup_api_keys.sh β€” umask 077 Β· chmod 600 Β· read -s (no echo) Β· atomic writes Β· zero network
  • 🚫 No keys, .env, or PEM files in repo (.gitignore enforces)
  • 🚷 production trading keys forbidden in general agents β€” policy-enforced
  • 🧱 Per-agent isolation: --per-agent mode for env separation
  • πŸ” Dual PR review + conditional security escalation to Safiya

πŸ“Š Verify the install

paperclipai adapters list | grep hermes_local         # βœ… adapter registered
paperclipai company list                              # βœ… Pantheon present
ls -d ~/.hermes-* | wc -l                             # βœ… 32 (Owen skipped)
python3 scripts/validate_hermes_local_package.py      # βœ… 32 routed

🌍 OS matrix

OS Status
🐧 Linux βœ…
🍏 macOS βœ…
πŸͺŸ Windows (WSL2) βœ…
πŸͺŸ Windows native ❌ β€” use WSL (full reason in README_INSTALL.md)

πŸ—Ί Re-run flags

bash scripts/one_click_install.sh -y --validate-only        # validators only
bash scripts/one_click_install.sh -y --convert-only         # generate package
bash scripts/one_click_install.sh -y --no-bootstrap         # skip 32-home step
bash scripts/one_click_install.sh -y --skip-adapter-install # skip adapter register
bash scripts/one_click_install.sh -y --no-paperclip         # skip company import
bash scripts/one_click_install.sh -y --setup-keys           # add secure key prompt
bash scripts/one_click_install.sh -y --no-dreaming          # V8.6: skip nightly Dreaming cron

❓ FAQ

Q: Is this AGI? No. It's a coordination layer over the LLMs you already pay for, with persistent identity per agent.

Q: How much will it cost to run? Token usage scales with the work, not the agent count. An idle agent burns nothing. Heavy day β‰ˆ a few dollars of Anthropic + OpenAI traffic.

Q: Can I add my own agent? Yes. Drop a .md file in the company tree, give them a model, run the converter. They wake up next install.

Q: Why "Pantheon"? 33 named entities with distinct powers, organised under one head, reporting to a higher authority. Sound familiar?

Q: Can the agents see my code? Only the code you give them. Each agent runs in its own ~/.hermes-<slug>/ sandbox.

Q: What if an agent goes off the rails? Three layers stop it: per-agent budget caps, dual PR review (Clara + Cody), and human approval gates on merge / deploy / production trading rules.


πŸ“š Deeper docs


🚧 Boundary

Pantheon does not install Paperclip, Hermes, OpenClaw, provider API keys, or production trading keys. It stages a company/org package. Bring your own runtime, your own keys, your own ambition.


πŸ“œ License

MIT β€” see LICENSE.


🌟 If Pantheon ships its first PR for you, drop a star.

Make AI work for you, not the other way around.

Releases

No releases published

Packages

 
 
 

Contributors