Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .claude/.DS_Store
Binary file not shown.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,10 @@ docs/source-design/
# Local client-identifier blocklist for validate-docs.sh (NEVER commit).
# See CONTRIBUTING.md → Confidentiality for the rationale.
scripts/.client-blocklist

# macOS Finder metadata + local agent-tooling artifacts (not part of ZO)
.DS_Store
**/.DS_Store
.agents/
.codex/
AGENTS.md
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
<br/>

[![Status](https://img.shields.io/badge/status-validated-D87A57?style=flat-square&labelColor=12110F)](#status)
[![Tests](https://img.shields.io/badge/tests-780_passing-D87A57?style=flat-square&labelColor=12110F)](#status)
[![Tests](https://img.shields.io/badge/tests-812_passing-D87A57?style=flat-square&labelColor=12110F)](#status)
[![Agents](https://img.shields.io/badge/agents-21_defined-D87A57?style=flat-square&labelColor=12110F)](#agent-teams)
[![Docs](https://img.shields.io/badge/docs-zerooperators.com-D87A57?style=flat-square&labelColor=12110F)](https://docs.zerooperators.com)

Expand Down Expand Up @@ -356,7 +356,7 @@ Adds **Phase 0: Literature Review** (prior art survey, baseline definition). Pha
│ ├── Agent(name="oracle-qa", team_name="project") │
│ └── Agents communicate peer-to-peer via SendMessage │
│ │
│ The Lead knows all 20 agents and creates new ones on the │
│ The Lead knows all 21 agents and creates new ones on the │
│ fly if the project needs expertise not in the roster. │
│ │
├─────────────────────────────────────────────────────────────┤
Expand Down Expand Up @@ -444,7 +444,7 @@ zero-operators/
│ ├── semantic.py # fastembed + SQLite semantic search
│ ├── comms.py # JSONL event logger (5 event types)
│ └── evolution.py # Self-evolving post-mortem protocol
├── .claude/agents/ # 20 agent definitions
├── .claude/agents/ # 21 agent definitions
├── specs/ # 8 specification documents
├── plans/ # Project plan files
├── memory/ # Per-project state (STATE.md, DECISION_LOG, PRIORS)
Expand Down Expand Up @@ -495,7 +495,7 @@ delivery-repo/

| Phase | What | Status |
|-------|------|--------|
| 0 | Agent definitions (20) + Claude Code setup | Done |
| 0 | Agent definitions (21) + Claude Code setup | Done |
| 1 | Plan parser, target parser, comms logger, setup | Done |
| 2 | Memory layer, semantic index | Done |
| 3 | Orchestration engine + lifecycle wrapper | Done |
Expand Down
10 changes: 10 additions & 0 deletions docs/COMMANDS.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,16 @@ zo experiments diff EXP_A EXP_B --project NAME [--repo PATH]
- **show**: full details for one experiment including the content of every authored markdown artefact.
- **diff**: side-by-side comparison of two experiments' metrics and shortfalls. Useful for sibling comparisons (two parallel variants) and parent-child comparisons (did the iteration actually improve?).

### zo learnings promote

Promote generic, client-sanitised learnings from a project's `.zo/memory/PRIORS.md` to the platform `memory/zo-platform/PRIORS.md`.

```
zo learnings promote --project NAME [--repo PATH] [--dry-run]
```

**Fail-closed by design** (the platform repo is public): only priors in generic categories (`auto-learning`, `evolution`) that clear the client blocklist (`scripts/.client-blocklist`) are promoted. A prior that is plan-seeded / `domain`, or that matches a client identifier, is **blocked** — reported for manual review, never auto-rewritten. With no blocklist file configured, **nothing** is promoted. `--dry-run` screens and reports without writing. Every run prints an auditable promoted / blocked / duplicate report.

---

## Slash Commands
Expand Down
15 changes: 13 additions & 2 deletions docs/concepts/memory-and-continuity.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Every project gets a `memory/` directory with four canonical files (or `.zo/memo
Append-only audit trail. Every architectural decision, gate passage, scope change. Each entry has a timestamp, type, title, decision, rationale, alternatives considered, outcome.
</Card>
<Card title="PRIORS.md" icon="lightbulb">
Domain knowledge accumulated through running ZO. Each prior references the failure that triggered it. After 23 sessions, ZO has 34 documented priors.
Domain knowledge accumulated through running ZO. Each prior references the failure that triggered it. ZO has 40+ documented priors, each tracing to a real failure.
</Card>
<Card title="sessions/" icon="folder-open">
Per-session summary files (`session-NNN-YYYY-MM-DD.md`). Written at session end. Captures what was attempted, what shipped, what's next.
Expand Down Expand Up @@ -122,7 +122,7 @@ This prevents accumulation of irrelevant reasoning and keeps token costs predict

## Self-evolution in practice

The 34 priors in [`memory/zo-platform/PRIORS.md`](https://github.com/SamPlvs/zero-operators/blob/main/memory/zo-platform/PRIORS.md) are the cumulative output of this protocol. A few examples:
The 40+ priors in [`memory/zo-platform/PRIORS.md`](https://github.com/SamPlvs/zero-operators/blob/main/memory/zo-platform/PRIORS.md) are the cumulative output of this protocol. A few examples:

- **PR-001**: `claude --print --dangerously-skip-permissions` exits immediately. Captured after a tmux pane stayed blank during MNIST testing.
- **PR-005**: Aspirational rules without enforcement are dead letter. Captured after a documentation cascade was repeatedly ignored despite being written in CLAUDE.md.
Expand All @@ -131,6 +131,17 @@ The 34 priors in [`memory/zo-platform/PRIORS.md`](https://github.com/SamPlvs/zer

Each prior was earned by a real failure. The same mistake never happens twice.

## Per-project priors: seed → load → learn → promote

Priors aren't only hand-written. ZO maintains a project's `PRIORS.md` automatically across a run:

- **Seed** — at first session the plan's `## Domain Context and Priors` are written into the project `PRIORS.md`, so the team starts with the human's domain knowledge instead of a blank slate.
- **Load** — every Lead prompt injects the current project priors ("accumulated learnings — honor these before repeating past mistakes"), so agents see them before acting.
- **Learn** — when the autonomous Phase-4 loop hits a dead-end or plateau, the orchestrator records the failure and appends a durable `auto-learning` prior, so the next iteration (or a later session) doesn't repeat it.
- **Promote** — generic learnings can graduate to the platform with `zo learnings promote --project NAME --repo PATH`. It is **fail-closed**: only generic-category priors that clear the client blocklist are promoted; anything project-specific or matching a client identifier is blocked and reported, never auto-rewritten. With no blocklist configured, nothing is promoted.

This is how one project's experience compounds — within the project, and (sanitised) across the platform.

## Next

<CardGroup cols={2}>
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ZO has five primitives. Master these and the rest of the system follows.
Every project must define a hard, verifiable success metric. Without a measurable criterion, autonomous agents become hallucinating cost centers.
</Card>
<Card title="The team" icon="people-group" href="/concepts/the-team">
20 specialised personas, orchestrator, data engineer, model builder, oracle, XAI, code reviewer, and more, communicate peer-to-peer through Claude Code's native team APIs.
21 specialised personas, orchestrator, data engineer, model builder, oracle, XAI, code reviewer, and more, communicate peer-to-peer through Claude Code's native team APIs.
</Card>
<Card title="Phases & gates" icon="diagram-next" href="/concepts/phases-and-gates">
Six sequential phases, each separated by a gate. Automated gates run validation; blocking gates pause for a human.
Expand Down
9 changes: 6 additions & 3 deletions docs/concepts/the-team.mdx
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
title: "The team"
description: "20 specialised AI personas, each with a defined role, tier, and contract. The team coordinates peer-to-peer through Claude Code's native APIs."
description: "21 specialised AI personas, each with a defined role, tier, and contract. The team coordinates peer-to-peer through Claude Code's native APIs."
---

ZO ships with **20 agent definitions** in `.claude/agents/`. Each is a Markdown file with YAML frontmatter (name, model tier, role, team) plus a structured prompt body covering ownership, off-limits, contracts, coordination protocol, and a self-validation checklist.
ZO ships with **21 agent definitions** in `.claude/agents/`. Each is a Markdown file with YAML frontmatter (name, model tier, role, team) plus a structured prompt body covering ownership, off-limits, contracts, coordination protocol, and a self-validation checklist.

## The two teams

ZO uses two distinct team configurations:

<CardGroup cols={2}>
<Card title="Project Delivery Team" icon="users-rays">
Executes the projects defined in `plan.md`. **11 launch + phase-in agents** covering data, model, oracle, code review, testing, XAI, domain evaluation, ML engineering, and infrastructure.
Executes the projects defined in `plan.md`. **12 launch + phase-in agents** covering data, model, oracle, code review, testing, XAI, domain evaluation, ML engineering, infrastructure, and live training monitoring.
</Card>
<Card title="Platform Build Team" icon="hammer">
Used to build and maintain ZO itself. **6 specialised agents** including software architect, backend engineer, frontend engineer, platform code reviewer, platform test engineer, and documentation agent.
Expand Down Expand Up @@ -75,6 +75,9 @@ Activated for specific phases or by plan opt-in:
<Accordion title="Infra Engineer (Haiku)" icon="hard-drive">
Compute resource allocation, experiment tracking setup, artifact storage, logging. Format-following work, appropriate for the smaller model.
</Accordion>
<Accordion title="Training Checker (Sonnet)" icon="heart-pulse">
Phase 4 live training monitor — the Lead spawns one per model run as `training-{modelname}-checker`. Tails the active experiment's `metrics.jsonl` / `training_status.json`, alerts on NaN/divergence/gradient-blowup/overfit/stall so a broken run dies early, and writes a mechanistic `diagnosis.md` plus next-round suggestions (pairs with Research Scout's general-AI literature track).
</Accordion>
</AccordionGroup>

## Platform Build Team
Expand Down
4 changes: 2 additions & 2 deletions docs/demo.html
Original file line number Diff line number Diff line change
Expand Up @@ -997,7 +997,7 @@ <h1>ZERO OPERATORS</h1>

<div class="stat-block">
<div class="stat-item scanlines">
<div class="stat-num">20<span class="stat-unit">agents</span></div>
<div class="stat-num">21<span class="stat-unit">agents</span></div>
<div class="stat-label">Defined</div>
</div>
<div class="stat-item scanlines">
Expand Down Expand Up @@ -1255,7 +1255,7 @@ <h1>USER WORKFLOW</h1>
</div>

<h1>AGENT TEAMS</h1>
<div class="subtitle">20 agents across project delivery, platform build, draft scouts, and init</div>
<div class="subtitle">21 agents across project delivery, platform build, draft scouts, and init</div>

<div class="section-label">Project Delivery Team -- Launch Agents</div>

Expand Down
2 changes: 1 addition & 1 deletion docs/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ cd zero-operators
Resolves and installs `pyproject.toml` dependencies, pydantic, fastembed, click, rich, pyyaml.
</Step>
<Step title="Agent definitions">
Confirms 20 agent `.md` files in `.claude/agents/`.
Confirms 21 agent `.md` files in `.claude/agents/`.
</Step>
<Step title="Slash commands">
Confirms 24 commands in `.claude/commands/`.
Expand Down
2 changes: 1 addition & 1 deletion docs/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The human is the research director. The plan is the only communication medium. A
Every project defines a hard, verifiable success metric. No deliverable is complete until the oracle confirms it.
</Card>
<Card title="The team" icon="people-group" href="/concepts/the-team">
A team of 20 specialised agents, orchestrator, data, model, oracle, XAI, and more, coordinates over a contract-first protocol.
A team of 21 specialised agents, orchestrator, data, model, oracle, XAI, and more, coordinates over a contract-first protocol.
</Card>
<Card title="The memory" icon="brain" href="/concepts/memory-and-continuity">
`STATE.md`, `DECISION_LOG.md`, `PRIORS.md`, and a semantic index give every session continuity with the last.
Expand Down
24 changes: 24 additions & 0 deletions memory/zo-platform/DECISION_LOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -1149,3 +1149,27 @@ The `--no-headlines` flag is preserved (not removed) for backwards compatibility
- Opus for the checker — deferred: it both tails and reasons, but Sonnet matches oracle-qa's tier and is cost-appropriate for a long-running monitor; revisit if diagnosis depth is insufficient.

**Outcome:** Agent roster 20 → 21; full doc cascade (setup.sh EXPECTED_AGENTS + count + pass-msg, README badge + roster + status line, lead-orchestrator count + roster, specs/agents.md counts + entry, plans, PRD §9 + tree). +20 tests (760 → 780 + 7 skipped), pytest green on Python 3.11 AND 3.12 (PR-039 matrix), ruff `src/` clean; test-file additions ruff-clean (2 pre-existing warnings in test_training_metrics.py left out of scope per PR-039). validate-docs 9 pass / 0 fail / 2 warn (client-blocklist skip + known grep-768-vs-pytest-780 test-badge parameterization gap; README badge updated 743 → 780 to reflect the true pytest count). PR-040 added to PRIORS. **Next:** Batch A (per-project self-evolution: seed plan priors → load full project priors into prompts → wire EvolutionEngine into failure paths → automated sanitized promotion to platform PRIORS per the user's choice), then D (optimization audit: broaden ml-engineer + new software-engineer agent), then E (swarm reinforcement + idle-agent shutdown). Branch `claude/training-intelligence`.

---

## Decision: 2026-05-30T16:00:00Z
**Type:** FEATURE + EVOLUTION
**Title:** Batch A — per-project self-evolution (seed/load/write/promote): wiring the dead EvolutionEngine + seed_priors; plus audit-driven cleanup (4 latent log_error bugs, docs-site drift, hygiene)

**Decision:** Second process-hardening batch from the user's "make ZO learn so I stop re-telling it." Investigation + the `repo-cleanup-audit` swarm confirmed the self-evolution machinery was fully implemented but DEAD — `EvolutionEngine` never imported/called, `seed_priors` never invoked, `_prompt_memory` injected only decision summaries. Wired end-to-end:

1. **Seed** — `Orchestrator._maybe_seed_priors()` (from `start_session`) writes the plan's `domain_priors` into the project `.zo/memory/PRIORS.md` when PRIORS is empty (idempotent; never clobbers accumulated learnings).
2. **Load** — `_prompt_memory` injects up to 8 non-superseded project priors (statement + evidence) under "Project priors (accumulated learnings — honor these…)", distinct from semantic decision matches.
3. **Write** — `EvolutionEngine(memory, comms, zo_root)` instantiated in `__init__`; new `_record_learning()` (uses `evolution.record_failure` + `memory.append_prior`) fires on the autonomous loop's DEAD_END/PLATEAU verdicts in `_auto_iterate_if_needed`, persisting a durable `auto-learning` prior.
4. **Promote** — new `src/zo/promote.py` (`screen_prior`, `promote_learnings`, `load_blocklist`, `PromotionReport`) + `zo learnings promote --project --repo [--dry-run]`. **Fail-closed** automated sanitized promotion: only `auto-learning`/`evolution` categories that clear `scripts/.client-blocklist` are promoted; `domain`/plan-seeded/blocklist-hit priors are BLOCKED + reported (never auto-rewritten); a missing blocklist promotes nothing. Returns an auditable promoted/blocked/duplicate report.

**Rationale:** The user chose "automated sanitized promotion" (auto-strip + promote, no per-item approval). On the legal-critical public-repo path I implemented **block-not-strip**: stripping a client term from a sentence leaves garbled/misleading text and can miss adjacent project-specific words, so refusing (and reporting for manual rewrite) is safer than guessing — while still automated and approval-free for clean priors. The seed/load/write wiring is the PR-009 lesson ("built ≠ wired") applied to the evolution engine: all machinery existed and was unit-tested, it was simply never connected to the run.

**Audit-driven (repo-cleanup-audit swarm, 6 agents, 27 findings):** confidentiality CLEARED. Fixed **4 latent `log_error(message=)` bugs** in `orchestrator.py` failure branches (`CommsLogger.log_error` has no `message` param and requires `description` → `TypeError` if the branch fired; never caught because rarely executed) — surfaced by wiring the dead engine; added a regression test forcing the `_generate_test_report` failure branch. Checklist auto-refresh made **best-effort** (`_safe_refresh_checklist` + `contextlib.suppress(OSError)`). Fixed **docs-site drift** (docs/*.mdx + demo.html + README → 21 agents; training-checker accordion in `the-team.mdx`; self-evolution section + `zo learnings` in `memory-and-continuity.mdx`/`COMMANDS.md`). Hygiene: untracked + gitignored `.claude/.DS_Store`, gitignored `.agents/`/`.codex/`/`AGENTS.md` (per user), fixed deprecated `source_dir` example in `draft.py`.

**Alternatives considered:**
- Strip-and-promote (user's literal wording) — rejected on the legal path; block-not-strip is safer and the sanitizer design was delegated to me. Trivial to switch to strip later.
- Full `run_postmortem` on every dead-end — deferred; `record_failure` + clean `append_prior` avoids the engine's auto-phrasing ("Add new prior: …") and keeps prior statements readable.
- Category-allowlist-only (skip blocklist) — rejected; defence-in-depth wants generic-category AND blocklist-clear AND no-blocklist-refusal.

**Outcome:** modified `src/zo/{orchestrator,experiments,cli,draft}.py`; new `src/zo/promote.py`; docs (`the-team`, `memory-and-continuity`, `overview`, `introduction`, `installation`, `demo.html`, `COMMANDS.md`, `README.md`); `.gitignore`; new `tests/unit/test_promote.py` (15) + additions to test_orchestrator/test_experiments/test_training_metrics/test_cli. **+32 tests (780 → 812 + 7 skipped), green on Python 3.11 AND 3.12, ruff `src/` clean, validate-docs 0 failures.** PR-041 added. **Deferred (audit #13/#15/#16/#17):** semantic reindex at session-end, agent failure-reporting protocol, `end_session` DECISION_LOG/PRIORS integration, `zo retrospective` CLI. Branch `claude/self-evolution`, stacked on #95.
Loading
Loading