The fleet doesn't need to know who answered. It needs to know the answer was good enough.
AgentMind is the model routing layer for AiCIV civilizations. It sits between your agent fleet and multiple inference backends, routing each request to the cheapest model that can handle it.
60 agents all hitting Opus = ~$8,100/month. 60 agents through AgentMind (70% bulk / 20% standard / 10% frontier) = ~$340/month.
That's 24x cost reduction with zero quality degradation on the tasks that matter.
But cost savings is just the start. AgentMind is where the Hyperagent self-improvement loop meets the protocol layer — every inference call produces an Envelope, feeds the skill auditor, and makes the entire system smarter over time.
| Tier | Models | Cost | Use Cases |
|---|---|---|---|
| T1 — Bulk | Llama 3.3 70B (Groq), Mixtral (Together), Qwen 3 (Fireworks) | $0.05–0.30/Mtok | Health checks, message routing, file triage, slot extraction, memory search, heartbeats |
| T2 — Standard | Claude Haiku 4.5, Claude Sonnet 4.6 | $0.80–3.00/Mtok | Code generation, document analysis, agent dialogue, skill execution, research synthesis |
| T3 — Frontier | Claude Opus 4.6 | $15/Mtok | Architecture decisions, legal analysis, constitutional amendments, deep research, novel problem solving |
The insight: 70% of agent calls are simple classification, routing, and triage. These don't need $15/Mtok frontier reasoning. They need good enough at cheap enough.
This repo includes 5 self-improvement skills inspired by Meta's Hyperagents paper (arxiv 2603.19461). Together with AgentMind, they form a complete self-improving intelligence system:
┌─────────────────────┐
│ AgentMind │
│ (routes requests │
│ to cheapest model) │
└──────────┬──────────┘
│
Envelope on every call
(tier, cost, latency, agent, skill)
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ skill-effectiveness│ │ self-improving │ │ cross-domain │
│ -auditor │ │ -delegation │ │ -transfer │
│ │ │ │ │ │
│ "Which skills are │ │ "Are we routing │ │ "Did a routing │
│ burning T3 calls │ │ to the right │ │ improvement in │
│ when T1 would │ │ team lead?" │ │ fleet transfer │
│ suffice?" │ │ │ │ to research?" │
└────────┬─────────┘ └────────┬────────┘ └────────┬─────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ hyperagent │ │ meta-curriculum │
│ -archive │ │ -evolution │
│ │ │ │
│ "Keep variants │ │ "Is our nightly │
│ of what worked │ │ training │
│ AND what didn't"│ │ teaching the │
│ │ │ right things?" │
└──────────────────┘ └─────────────────┘
The loop: AgentMind routes calls → produces Envelopes → Envelopes feed the auditor → auditor identifies misrouted skills → delegation improves → curriculum evolves → cross-domain transfers propagate improvements → archive keeps all variants → AgentMind config updates → better routing → repeat.
The meta-layer: Each skill can improve itself. The curriculum evolution skill checks whether its own adjustments are improving brief quality. The delegation skill checks whether its pattern extraction is actually reducing misroutes. This is the Hyperagents paper's core insight implemented at civilization scale.
agentmind/
├── README.md # This file
├── SPEC.md # Full AgentMind specification (v0.1.0-draft)
│ # Architecture, API, tiers, NATS, budgets, Envelopes
│
├── skills/ # Hyperagent self-improvement skills
│ ├── meta-curriculum-evolution.md # Training rewrites its own curriculum
│ ├── self-improving-delegation.md # CEO Rule routing learns from mistakes
│ ├── skill-effectiveness-auditor.md # Fitness scoring for all skills (A-F tiers)
│ ├── hyperagent-archive.md # Evolutionary DAG of skill variants
│ └── cross-domain-transfer.md # Meta-improvements propagate across verticals
│
├── server.py # [TODO] FastAPI service
├── classifier.py # [TODO] Tier classifier
├── providers/ # [TODO] Backend adapters
│ ├── groq.py
│ ├── anthropic.py
│ ├── together.py
│ ├── fireworks.py
│ └── local.py # Ollama adapter
├── envelope.py # [TODO] APS Envelope production
├── budget.py # [TODO] Cost tracking + throttling
├── config.yaml # [TODO] Provider registry + skill-tier mapping
├── Dockerfile # [TODO]
├── docker-compose.yml # [TODO]
└── tests/ # [TODO]
cat SPEC.mdThe full architecture: tiers, routing, auth, NATS, budgets, Envelopes, API endpoints. This is the blueprint.
for f in skills/*.md; do echo "=== $f ==="; head -30 "$f"; echo; doneThese define the self-improvement loop that sits on top of AgentMind.
# Sign up (all have free tiers):
# - Groq: console.groq.com (free tier: 30 req/min)
# - Together: api.together.xyz (free $25 credit)
# - Fireworks: fireworks.ai (free tier available)
# Add to .env:
echo "GROQ_API_KEY=gsk_..." >> .env
echo "TOGETHER_API_KEY=..." >> .env
echo "FIREWORKS_API_KEY=..." >> .envThe classifier is just a function: (messages, metadata) → tier. Start with rule-based (skill mapping from SPEC.md Section 5.2). No ML needed for v1.
# classifier.py — the core routing decision
def classify(messages: list, metadata: dict, tools: list) -> str:
"""Returns 'T1', 'T2', or 'T3'."""
# Override takes priority
if metadata.get("tier_override"):
return metadata["tier_override"]
# Tools present → T2 minimum
if tools:
return max_tier("T2", metadata.get("tier_hint", "T2"))
# Skill mapping
skill = metadata.get("skill", "")
tier = SKILL_TIERS.get(skill)
if tier:
return tier
# Role minimum
role = metadata.get("agent", "")
role_min = ROLE_MINIMUMS.get(role)
if role_min:
return role_min
# Default to T1 (cheapest)
return metadata.get("tier_hint", "T1")Start with Groq (fastest, free tier, OpenAI-compatible):
# providers/groq.py
import httpx
async def complete(messages, max_tokens=2048, temperature=0.7):
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://api.groq.com/openai/v1/chat/completions",
headers={"Authorization": f"Bearer {GROQ_API_KEY}"},
json={
"model": "llama-3.3-70b-versatile",
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
},
timeout=30,
)
return resp.json()# server.py — minimal viable AgentMind
@app.post("/api/v1/completions")
async def completions(request: CompletionRequest, actor = Depends(get_current_actor)):
tier = classify(request.messages, request.metadata, request.tools)
backend = select_backend(tier)
response = await backend.complete(request.messages, request.max_tokens)
envelope = produce_envelope(tier, backend, response, actor)
return {"content": response["choices"][0]["message"]["content"], "tier": tier, ...}Anthropic format differs from OpenAI. The translation is straightforward — see SPEC.md Section 6.2.
Every call → Envelope → audit trail. This is the first APS service to actually implement Envelopes (the spec required them since v0.1 but no service did it yet).
SQLite counter per (civ_id, date, tier). Throttle T3→T2 at 95% daily budget. Hard stop at 100%.
Docker container, JWT auth via AgentAUTH, NATS subscriber for fleet-wide routing.
| Scenario | Monthly Cost | Notes |
|---|---|---|
| All Opus (current) | ~$8,100 | 60 agents × 100 calls/day × Opus pricing |
| All Sonnet | ~$1,620 | Better but still expensive |
| AgentMind Tiered | ~$340 | 70% T1 / 20% T2 / 10% T3 |
| AgentMind + Local T0 | ~$200 | Add Ollama for truly free bulk inference |
For client civilizations: Each civ configures their own provider keys and budgets. AgentMind is self-hostable (docker-compose up). Protocol, not platform.
For our own inference stack: Phase 4 adds Ollama as a T0 backend. When we run our own GPUs, the T1 tier becomes free. The 70% of calls that are bulk routing/triage cost literally nothing.
Meta's Hyperagents (arxiv 2603.19461) showed that self-referential agents — where the improvement mechanism can improve itself — develop persistent memory, performance tracking, and cross-domain transfer autonomously.
We took this further:
- Persistent memory → we already have this (memory system, scratchpads, agent learnings)
- Performance tracking →
skill-effectiveness-auditor(fitness scores for all 142+ skills) - Cross-domain transfer →
cross-domain-transfer(propagate improvements across 11 verticals) - Evolutionary archive →
hyperagent-archive(keep ALL variants, failures are stepping stones) - Self-improving routing →
self-improving-delegation(CEO Rule learns from its own mistakes) - Self-improving curriculum →
meta-curriculum-evolution(nightly training rewrites itself)
AgentMind is where these skills get REAL DATA. Every Envelope from AgentMind feeds the auditor, which feeds the archive, which feeds the transfer system, which feeds the curriculum, which feeds the routing — a complete self-improving intelligence loop.
The paper's biggest finding: meta-improvements transfer across domains with zero customization. Our cross-domain-transfer skill implements this. When research vertical discovers a better prompt pattern, it propagates to all 11 verticals automatically.
- This week: Build Phase 1 (classifier + Groq backend + Anthropic backend + single endpoint)
- Next week: NATS integration, Envelope production, budget controls
- Week 3-4: Fleet deployment, HUB graph integration, AGO dashboard
- When ready: Local inference (Ollama T0), client self-hosting
- SPEC.md — Full architectural specification
- Hyperagents paper — arxiv.org/abs/2603.19461
- APS Protocol —
projects/aiciv-hub/PROTOCOL.md(the protocol AgentMind extends) - AgentAUTH —
projects/agentauth/(JWT auth that AgentMind uses) - Berman/OpenClaw teardown —
memories/knowledge/competitive/berman-openclaw-teardown-20260324.md
"The fleet doesn't need to know who answered. It needs to know the answer was good enough."
AgentMind v0.1.0-draft — authored by Corey Cottrell & A-C-Gee, March 2026