From a034d109db2ecad522c3b627559a1fea22502930 Mon Sep 17 00:00:00 2001 From: SamPlvs Date: Fri, 10 Apr 2026 14:36:13 +0100 Subject: [PATCH] feat(evolution): three-layer defense against doc-codebase drift MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add automated documentation consistency validation with enforcement hooks. When the 17th agent (Research Scout) was added, 10+ files had stale counts, version numbers, and model tiers. Root cause: CLAUDE.md cascade protocol existed as text but had zero enforcement. Layer 1 — scripts/validate-docs.sh: 7 programmatic checks (agent count, names, commands, version, tiers, tests, setup.sh literal). Runs in <2s. Exits non-zero on failure. Layer 2 — Claude Code hooks (.claude/settings.json): PreToolUse on git commit: blocks if validation fails PostToolUse on Write|Edit: cascade reminders for trigger files Stop: checks for uncommitted changes in trigger paths Layer 3 — Documentation: CLAUDE.md cascade protocol replaced with file-to-file mappings PR-005 added to PRIORS.md (missing_rule → enforcement) commit command updated with validation step Also fixes: - Agent count 16→17 across 10 files (Research Scout was undocumented) - Version 1.0.0→1.0.1 in pyproject.toml and __init__.py - Test badge 295→298, setup checks 11→10 - Command count 23→24 in STATE.md - Model Builder and Backend Engineer tiers: "Sonnet/Opus"→"Opus" - Research Scout added to specs/agents.md roster (#7), phase-in renumbered - PRD.md: 6 launch→7 launch, 10→11 project delivery agents Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/agents/lead-orchestrator.md | 5 +- .claude/commands/commit.md | 15 +- .claude/hooks/cascade-reminder.sh | 53 ++++++ .claude/hooks/pre-commit-validate.sh | 42 +++++ .claude/hooks/stop-check.sh | 49 ++++++ .claude/settings.json | 40 +++++ CLAUDE.md | 29 +++- PRD.md | 4 +- README.md | 10 +- memory/zo-platform/DECISION_LOG.md | 24 ++- memory/zo-platform/PRIORS.md | 48 ++++++ memory/zo-platform/STATE.md | 4 +- plans/zero-operators-build.md | 34 ++-- pyproject.toml | 2 +- scripts/validate-docs.sh | 242 +++++++++++++++++++++++++++ setup.sh | 8 +- specs/agents.md | 49 +++++- src/zo/__init__.py | 2 +- 18 files changed, 608 insertions(+), 52 deletions(-) create mode 100755 .claude/hooks/cascade-reminder.sh create mode 100755 .claude/hooks/pre-commit-validate.sh create mode 100755 .claude/hooks/stop-check.sh create mode 100755 scripts/validate-docs.sh diff --git a/.claude/agents/lead-orchestrator.md b/.claude/agents/lead-orchestrator.md index 4352bb3..1da75ec 100644 --- a/.claude/agents/lead-orchestrator.md +++ b/.claude/agents/lead-orchestrator.md @@ -12,12 +12,13 @@ You do NOT write code, train models, or compute metrics. You plan, coordinate, g ## Agent Roster -You have 16 pre-defined agents available in `.claude/agents/`. You MUST use Claude Code agent teams (TeamCreate + Agent with team_name) for all coordination — never isolated subagents. +You have 17 pre-defined agents available in `.claude/agents/`. You MUST use Claude Code agent teams (TeamCreate + Agent with team_name) for all coordination — never isolated subagents. **Project Delivery Team (spawn for project work):** | Agent | Model | Expertise | When to spawn | |-------|-------|-----------|---------------| +| research-scout | Opus | Literature survey, SOTA, open-source code, experiment plans | Phase 0: before model design begins | | data-engineer | Sonnet | Data pipeline, cleaning, EDA, DataLoaders | Phase 1-2: data review and feature engineering | | model-builder | Opus | Architecture selection, training, iteration | Phase 3-4: model design and training | | oracle-qa | Sonnet | Hard metric evaluation, gating, drift detection | Phase 3-5: after every training iteration | @@ -41,7 +42,7 @@ You have 16 pre-defined agents available in `.claude/agents/`. You MUST use Clau ## Dynamic Agent Creation -If a project requires expertise not covered by the 16 pre-defined agents, you MUST create a new agent definition before spawning: +If a project requires expertise not covered by the 17 pre-defined agents, you MUST create a new agent definition before spawning: 1. **Identify the gap** — what expertise is missing? (e.g., "NLP specialist", "time-series expert", "security auditor") 2. **Write the agent .md file** to `.claude/agents/{new-agent-name}.md` following the same format: diff --git a/.claude/commands/commit.md b/.claude/commands/commit.md index 09f316b..9b13a13 100644 --- a/.claude/commands/commit.md +++ b/.claude/commands/commit.md @@ -36,7 +36,16 @@ You are creating a git commit for the current changes following Zero Operators c - What scope is affected? (e.g., `orchestrator`, `memory`, `oracle`, `agents`) - What is the concise subject? (imperative mood, lowercase, no period) -5. **Stage relevant files**. Do NOT stage: +5. **Run documentation validation** to catch doc-codebase drift: + ```bash + ./scripts/validate-docs.sh + ``` + If any checks fail, fix the inconsistencies before committing. Common cascade fixes: + - Agent added → update count in setup.sh, README.md, specs/agents.md, lead-orchestrator.md + - Command added → update count in README.md, docs/COMMANDS.md, STATE.md + - Version bumped → update pyproject.toml, src/zo/__init__.py, src/zo/cli.py + +6. **Stage relevant files**. Do NOT stage: - `.env` files - Credential files, API keys, tokens - Large binary files (unless intentional) @@ -48,14 +57,14 @@ You are creating a git commit for the current changes following Zero Operators c git add {specific files} ``` -6. **Create the commit** with the conventional format: +7. **Create the commit** with the conventional format: ``` type(scope): subject Body explaining what changed and why (if non-obvious). ``` -7. **Report** to the user: +8. **Report** to the user: - The commit message used - Files committed - The commit hash diff --git a/.claude/hooks/cascade-reminder.sh b/.claude/hooks/cascade-reminder.sh new file mode 100755 index 0000000..a7cb210 --- /dev/null +++ b/.claude/hooks/cascade-reminder.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# Zero Operators — Cascade reminder hook +# Triggered by PostToolUse on Write|Edit +# Checks if modified file is a cascade trigger and reminds about updates. + +set -uo pipefail + +# Read stdin (Claude Code hook JSON) +INPUT=$(cat) + +# Extract the file path from tool_input +# Write tool: tool_input.file_path +# Edit tool: tool_input.file_path +FILE_PATH=$(echo "$INPUT" | grep -oE '"file_path"[[:space:]]*:[[:space:]]*"[^"]*"' | head -1 | sed 's/.*: *"//;s/"//') + +# If we couldn't extract a file path, exit silently +if [[ -z "$FILE_PATH" ]]; then + exit 0 +fi + +# Check against cascade trigger patterns +REMINDER="" + +case "$FILE_PATH" in + *.claude/agents/*) + REMINDER="Agent file modified. Cascade check: update setup.sh (EXPECTED_AGENTS + count), README.md (badge + roster), specs/agents.md (team counts), lead-orchestrator.md (agent count + roster), plans/zero-operators-build.md (counts). Run: ./scripts/validate-docs.sh" + ;; + *.claude/commands/*) + REMINDER="Command file modified. Cascade check: update README.md (command count), docs/COMMANDS.md (add/remove entry), memory/zo-platform/STATE.md (count). Run: ./scripts/validate-docs.sh" + ;; + */pyproject.toml) + REMINDER="pyproject.toml modified. If version changed, also update: src/zo/__init__.py (__version__), src/zo/cli.py (_VERSION). Run: ./scripts/validate-docs.sh" + ;; + */src/zo/__init__.py) + REMINDER="__init__.py modified. If version changed, also update: pyproject.toml (version), src/zo/cli.py (_VERSION). Run: ./scripts/validate-docs.sh" + ;; + */src/zo/cli.py) + REMINDER="cli.py modified. If version or commands changed, also update: pyproject.toml (version), src/zo/__init__.py (__version__), README.md, docs/COMMANDS.md. Run: ./scripts/validate-docs.sh" + ;; +esac + +if [[ -n "$REMINDER" ]]; then + cat <&1) +VALIDATION_EXIT=$? + +if [[ $VALIDATION_EXIT -ne 0 ]]; then + # Validation failed — block the commit + # Strip ANSI color codes for clean JSON + CLEAN_OUTPUT=$(echo "$VALIDATION_OUTPUT" | sed 's/\x1b\[[0-9;]*m//g') + + cat </dev/null | grep -q .; then + DIRTY_TRIGGERS="$DIRTY_TRIGGERS $path" + fi + if git diff --name-only -- "$path" 2>/dev/null | grep -q .; then + DIRTY_TRIGGERS="$DIRTY_TRIGGERS $path" + fi +done + +# Also check for untracked agent/command files +UNTRACKED=$(git ls-files --others --exclude-standard .claude/agents/ .claude/commands/ 2>/dev/null) +if [[ -n "$UNTRACKED" ]]; then + DIRTY_TRIGGERS="$DIRTY_TRIGGERS (untracked: $UNTRACKED)" +fi + +if [[ -n "$DIRTY_TRIGGERS" ]]; then + cat < [![Status](https://img.shields.io/badge/status-validated-F0C040?style=flat-square&labelColor=080808)](#status) -[![Tests](https://img.shields.io/badge/tests-295_passing-F0C040?style=flat-square&labelColor=080808)](#status) +[![Tests](https://img.shields.io/badge/tests-298_passing-F0C040?style=flat-square&labelColor=080808)](#status) [![Agents](https://img.shields.io/badge/agents-17_defined-F0C040?style=flat-square&labelColor=080808)](#agent-teams) [![E2E](https://img.shields.io/badge/MNIST-99%25_accuracy-F0C040?style=flat-square&labelColor=080808)](#e2e-validation) @@ -306,7 +306,7 @@ Adds **Phase 0: Literature Review** (prior art survey, baseline definition). Pha ## Agent Teams -**Project Delivery Team** — 10 agents that execute ML/research projects: +**Project Delivery Team** — 11 agents that execute ML/research projects: | Agent | Model | When Active | What They Do | |-------|-------|-------------|-------------| @@ -366,8 +366,8 @@ zero-operators/ ├── memory/ # Per-project state (STATE.md, DECISION_LOG, PRIORS) ├── logs/ # JSONL audit trails ├── targets/ # Delivery repo configuration -├── tests/ # 296 tests (unit + integration) -├── setup.sh # Environment validation (11 checks) +├── tests/ # 298 tests (unit + integration) +├── setup.sh # Environment validation (10 checks) └── pyproject.toml # Python package config ``` @@ -418,7 +418,7 @@ mnist-delivery/ ← delivery repo (clean) | 5 | E2E validation (MNIST: 99% accuracy) | Done | | 1.0.1 | Interactive tmux, brand panel, smart build, Research Scout, self-evolution | Done | -295 platform tests. ruff clean. 17 agents. 24 slash commands. +298 platform tests. ruff clean. 17 agents. 24 slash commands. --- diff --git a/memory/zo-platform/DECISION_LOG.md b/memory/zo-platform/DECISION_LOG.md index a601884..0cecd2a 100644 --- a/memory/zo-platform/DECISION_LOG.md +++ b/memory/zo-platform/DECISION_LOG.md @@ -73,7 +73,7 @@ Append-only. Every orchestration decision with timestamp, rationale, and outcome ## Decision: 2026-04-09T16:35:00Z **Type:** SEQUENCING **Title:** Agent definitions as Step 0 -**Decision:** Write all 16 agent .md files before any Python code. +**Decision:** Write all 17 agent .md files before any Python code. **Rationale:** (1) Immediately usable by Claude Code. (2) Define contracts all modules implement against. (3) Platform build agents needed to build ZO itself. (4) Forces finalization of every agent interface. **Outcome:** Phase 0 added to build sequence. Documented in build plan v2.0 as RD8. @@ -81,9 +81,9 @@ Append-only. Every orchestration decision with timestamp, rationale, and outcome ## Decision: 2026-04-09T17:15:00Z **Type:** MILESTONE -**Title:** Phase 0 complete — 16 agents written -**Decision:** All 16 agent definition files written to .claude/agents/. Settings.json created. Build plan updated to v2.0. -**Rationale:** Phase 0 deliverables met: 10 project delivery agents (6 launch + 4 phase-in) + 6 platform build agents. Each has YAML frontmatter, role description, ownership, off-limits, contract produced/consumed with inline examples, pointer to specs/agents.md, coordination rules, validation checklist. +**Title:** Phase 0 complete — 17 agents written +**Decision:** All 17 agent definition files written to .claude/agents/. Settings.json created. Build plan updated to v2.0. +**Rationale:** Phase 0 deliverables met: 11 project delivery agents (7 launch + 4 phase-in) + 6 platform build agents. Each has YAML frontmatter, role description, ownership, off-limits, contract produced/consumed with inline examples, pointer to specs/agents.md, coordination rules, validation checklist. **Outcome:** Gate 0 ready for human verification. Phase 1 unblocked. --- @@ -92,7 +92,7 @@ Append-only. Every orchestration decision with timestamp, rationale, and outcome **Type:** GATE **Title:** Gate 0 passed — Human approved agent definitions **Decision:** PROCEED to Phase 1 -**Rationale:** Sam reviewed and approved all 16 agent definitions and .claude/settings.json. +**Rationale:** Sam reviewed and approved all 17 agent definitions and .claude/settings.json. **Outcome:** Phase 1 (Scaffolding) unblocked. --- @@ -139,14 +139,14 @@ Append-only. Every orchestration decision with timestamp, rationale, and outcome **Decision:** wrapper.py launches ONE Claude Code session (the Lead Orchestrator), does NOT spawn individual agents via subprocess. The Lead Orchestrator creates the team internally using TeamCreate + Agent(team_name=...). Wrapper monitors via file system (tasks, logs, tmux). **Rationale:** Research confirmed that Claude Code agent teams with peer-to-peer comms (SendMessage) can only be created from WITHIN a running Claude Code session, not via external CLI calls. Previous design of wrapper.py spawning N individual agents would have produced isolated sessions without peer-to-peer messaging — defeating the core requirement. **Alternatives considered:** (1) N subprocess calls per agent (no peer-to-peer). (2) Custom file-based messaging (fragile, not native). (3) Single session with TeamCreate (chosen — leverages native peer-to-peer). -**Outcome:** wrapper.py redesigned as observer/launcher. orchestrator.py builds the lead prompt. Lead Orchestrator agent definition updated with 16-agent roster and dynamic agent creation capability. +**Outcome:** wrapper.py redesigned as observer/launcher. orchestrator.py builds the lead prompt. Lead Orchestrator agent definition updated with 17-agent roster and dynamic agent creation capability. --- ## Decision: 2026-04-09T19:10:00Z **Type:** ARCHITECTURE **Title:** Lead Orchestrator — dynamic agent creation -**Decision:** Lead Orchestrator can create new agent definition files (.claude/agents/*.md) on the fly if a project requires expertise not covered by the 16 pre-defined agents. +**Decision:** Lead Orchestrator can create new agent definition files (.claude/agents/*.md) on the fly if a project requires expertise not covered by the 17 pre-defined agents. **Rationale:** The agent roster is a starting point, not a ceiling. Real projects will need domain-specific experts (NLP specialist, time-series expert, security auditor). The Lead Orchestrator has the context to identify gaps and write appropriate agent definitions following the established template. **Outcome:** lead-orchestrator.md updated with agent roster table and dynamic creation protocol. @@ -225,3 +225,13 @@ Append-only. Every orchestration decision with timestamp, rationale, and outcome **Decision:** (1) `zo build` auto-detects mode (fresh/continue/plan-edited). (2) `zo continue` becomes a thin alias. (3) `zo maintain` removed entirely. (4) Phase review shows in ALL modes. (5) `zo draft` accepts multiple paths. (6) ZO brand panel at startup. **Rationale:** `zo continue` and `zo maintain` were doing almost the same thing as `zo build` — parsing plan, decomposing, launching agents. Having three commands confused the user. Smart detection in `zo build` handles all cases. Brand panel gives professional identity matching Claude Code's startup experience. **Outcome:** Simplified CLI: build (primary), continue (alias), draft, init, status. 295 tests pass. + +--- + +## Decision: 2026-04-10T14:00:00Z +**Type:** EVOLUTION +**Title:** Three-layer defense against doc-codebase drift +**Decision:** Implement automated documentation consistency validation with enforcement hooks. (1) `scripts/validate-docs.sh` — 7 programmatic checks (agent count, agent names, command count, version, model tiers, test badge, setup.sh literal). (2) PreToolUse hook in `.claude/settings.json` blocks `git commit` if validation fails. (3) PostToolUse hook on Write|Edit injects cascade reminders when trigger files modified. (4) CLAUDE.md updated with explicit file-to-file cascade mappings. (5) PR-005 added to PRIORS.md. +**Rationale:** Adding Research Scout (17th agent) left 10+ files with stale counts. CLAUDE.md "Cascade doc updates" protocol existed but had zero enforcement. PR-003 recommended hooks but they were never implemented. Root cause: `missing_rule` — aspirational text without enforcement degrades to suggestion. Self-evolution protocol requires fixing both the symptom and the rule. +**Alternatives considered:** (1) Manual discipline only — already failed. (2) CI pipeline — too heavy for v1, not available in local dev. (3) Automated validation with hooks (chosen) — catches drift at commit time, lightweight, works in all environments. +**Outcome:** Three new files created (validate-docs.sh, pre-commit-validate.sh, cascade-reminder.sh). settings.json updated with hooks. CLAUDE.md cascade protocol strengthened. PR-005 documents the failure and fix. Validation runs 10 checks in <2 seconds. diff --git a/memory/zo-platform/PRIORS.md b/memory/zo-platform/PRIORS.md index 5e016c0..a3bb2a9 100644 --- a/memory/zo-platform/PRIORS.md +++ b/memory/zo-platform/PRIORS.md @@ -125,3 +125,51 @@ file changes to src/zo/cli.py or .claude/agents/. 3. **Open-source code saves iteration cycles.** If a working implementation exists for a similar problem, adapting it is faster than building from scratch. Research Scout catalogs these. + +--- + +## PR-005: Aspirational Rules Without Enforcement Are Dead Letter +**Source:** Session 009 (2026-04-10), doc-codebase drift audit +**Root cause category:** missing_rule +**Failure:** CLAUDE.md "Cascade doc updates" protocol existed as text instructions +but had ZERO enforcement. Adding the 17th agent (Research Scout) left 10+ files +with stale agent counts (16 instead of 17), stale version (1.0.0 instead of 1.0.1), +stale test counts, and incorrect model tiers. PR-003 recommended a postToolUse hook +but it was never implemented — deferred prevention is no prevention. + +### Rules + +1. **Every protocol that says MUST needs a corresponding enforcement mechanism.** + Text-only rules degrade to suggestions within one session. If CLAUDE.md says + "Claude MUST update X before commit", there must be a hook, script, or CI check + that blocks the commit if X is not updated. + - *Failure ref:* CLAUDE.md cascade protocol was ignored across 3+ sessions. + +2. **When a PRIOR recommends a preventive action, implement it immediately.** + PR-003 recommended hooks. They were never built. The exact failure PR-003 + warned about then happened. Deferred prevention is no prevention. + - *Failure ref:* PR-003 recommended hooks (2026-04-10). Same session ended + without implementing them. Next session repeated the drift failure. + +3. **Documentation consistency is verified programmatically before every commit.** + `scripts/validate-docs.sh` checks agent count, command count, version, + model tiers, and name registry. A PreToolUse hook in `.claude/settings.json` + blocks `git commit` if validation fails. + - *Failure ref:* Manual doc updates across 10+ files are error-prone. + Automated validation catches drift before it reaches git history. + +### Verified Solution + +Three-layer defense against doc-codebase drift: + +1. **Layer 1 — Validation script** (`scripts/validate-docs.sh`): + 7 checks, runs in <2 seconds, exits non-zero on failure. Checks agent + count, agent names, command count, version, model tiers, test badge. + +2. **Layer 2 — Claude Code hooks** (`.claude/settings.json`): + - PreToolUse hook on `Bash(git commit *)` runs validation script, blocks commit on failure + - PostToolUse hook on `Write|Edit` injects cascade reminders when trigger files are modified + +3. **Layer 3 — Explicit cascade mappings** (CLAUDE.md): + File-to-file cascade chains for agent, command, version, and tier changes. + No ambiguity about which files to update. diff --git a/memory/zo-platform/STATE.md b/memory/zo-platform/STATE.md index 1ee1230..0b06412 100644 --- a/memory/zo-platform/STATE.md +++ b/memory/zo-platform/STATE.md @@ -12,13 +12,13 @@ ZO v1.0.1 — **complete, validated, and user-tested**. Interactive tmux agent s ## Completed -- [x] Phase 0: Agent definitions (16 files), settings.json, build plan v2.0 +- [x] Phase 0: Agent definitions (17 files), settings.json, build plan v2.0 - [x] Phase 1: Plan Parser, Target Parser, Comms Logger, setup.sh - [x] Phase 2: Memory Layer, Semantic Index - [x] Phase 3: Orchestration Engine + Lifecycle Wrapper + gate mode toggle - [x] Phase 4: Evolution Engine, CLI, integration tests - [x] Phase 5: End-to-end validation (MNIST: 99% accuracy, 98 tests, ~$11) -- [x] Slash commands: 23 commands across 6 categories +- [x] Slash commands: 24 commands across 8 categories - [x] Documentation: COMMANDS.md reference, interactive HTML demo - [x] README: full user workflow, architecture, commands, e2e results - [x] v1.0.1: Interactive tmux agent sessions (send-keys + paste-buffer) diff --git a/plans/zero-operators-build.md b/plans/zero-operators-build.md index 8b341f7..f94ad5d 100644 --- a/plans/zero-operators-build.md +++ b/plans/zero-operators-build.md @@ -160,7 +160,7 @@ Deliverables: a working ZO platform deployed as a Python package in the `zero-op ### Module 0: Agent Definitions + Claude Code Setup ✅ **Spec source:** specs/agents.md **Responsibility:** Write all 17 agent `.md` files to `.claude/agents/` with YAML frontmatter and full spawn prompts. Create `.claude/settings.json`. Validate agents can be spawned. -**Outputs:** 16 `.md` files, `.claude/settings.json`, validation report. +**Outputs:** 17 `.md` files, `.claude/settings.json`, validation report. **Status:** COMPLETE ### Module 1: Plan Parser and Validator ✅ @@ -203,9 +203,9 @@ Deliverables: a working ZO platform deployed as a Python package in the `zero-op **Responsibility:** Three-layer architecture: 1. `orchestrator.py` — Parses plan, decomposes phases (classical_ml/deep_learning/research), generates agent contracts, builds lead prompt with full context (plan + phases + agent roster + memory + coordination instructions), manages gates, detects plan edits 2. `wrapper.py` — Launches ONE Claude Code session as Lead Orchestrator (`claude --teammate-mode tmux`), monitors team via `~/.claude/tasks/` and session logs, captures tmux pane output, handles rate limits, pipes events to CommsLogger -3. Lead Orchestrator agent (inside Claude Code) uses `TeamCreate` + `Agent(team_name=...)` for native peer-to-peer messaging. Can dynamically create new agent definitions if project needs expertise beyond 16 pre-defined agents. +3. Lead Orchestrator agent (inside Claude Code) uses `TeamCreate` + `Agent(team_name=...)` for native peer-to-peer messaging. Can dynamically create new agent definitions if project needs expertise beyond 17 pre-defined agents. **Outputs:** Orchestrator class + LifecycleWrapper class (73 tests). -**Files:** `src/zo/orchestrator.py` (532 lines), `src/zo/_orchestrator_models.py`, `src/zo/_orchestrator_phases.py`, `src/zo/wrapper.py` (382 lines), `src/zo/_wrapper_models.py` +**Files:** `src/zo/orchestrator.py` (565 lines), `src/zo/_orchestrator_models.py`, `src/zo/_orchestrator_phases.py`, `src/zo/wrapper.py` (601 lines), `src/zo/_wrapper_models.py` **Architecture note:** Python layer does NOT spawn agents directly. It builds context and launches one session. Agent coordination is native Claude Code with peer-to-peer comms. **Status:** COMPLETE @@ -294,19 +294,25 @@ Gate 5: Test project passes 18 oracle verification checks ``` src/zo/ ├── __init__.py -├── plan.py # Module 1 -├── target.py # Module 2 -├── memory.py # Module 3 -├── semantic.py # Module 4 -├── comms.py # Module 5 -├── orchestrator.py # Module 6 -├── wrapper.py # Module 6 -├── evolution.py # Module 7 -├── cli.py # Module 9 -└── draft.py # Module 9 +├── plan.py # Module 1 +├── target.py # Module 2 +├── memory.py # Module 3 +├── _memory_models.py # Module 3 (models) +├── _memory_formats.py # Module 3 (markdown I/O) +├── semantic.py # Module 4 +├── comms.py # Module 5 +├── orchestrator.py # Module 6 +├── _orchestrator_models.py # Module 6 (models) +├── _orchestrator_phases.py # Module 6 (phase definitions) +├── wrapper.py # Module 6 +├── _wrapper_models.py # Module 6 (models) +├── evolution.py # Module 7 +├── _evolution_models.py # Module 7 (models) +├── cli.py # Module 9 +└── draft.py # Module 9 .claude/ -├── agents/ # Module 0 (16 files) ✅ +├── agents/ # Module 0 (17 files) ✅ └── settings.json # Module 0 ✅ setup.sh # Module 8 diff --git a/pyproject.toml b/pyproject.toml index 71f2e96..002f18c 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "zero-operators" -version = "1.0.0" +version = "1.0.1" description = "Autonomous AI research and engineering team system" readme = "README.md" requires-python = ">=3.11" diff --git a/scripts/validate-docs.sh b/scripts/validate-docs.sh new file mode 100755 index 0000000..9dc4819 --- /dev/null +++ b/scripts/validate-docs.sh @@ -0,0 +1,242 @@ +#!/bin/bash +# Zero Operators — Documentation Consistency Validator +# Checks that documentation claims match codebase reality. +# Run: ./scripts/validate-docs.sh +# +# Exit 0 = all pass, Exit 1 = any fail +# Uses same format as setup.sh +# macOS + Linux compatible (no grep -P) + +set -uo pipefail + +# Navigate to repo root (script may be called from anywhere) +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +cd "$SCRIPT_DIR/.." || exit 1 + +# Colors (matches setup.sh) +AMBER='\033[38;2;240;192;64m' +GREEN='\033[38;2;80;200;80m' +RED='\033[38;2;240;80;80m' +DIM='\033[38;2;100;100;100m' +RESET='\033[0m' + +PASS_COUNT=0 +FAIL_COUNT=0 +WARN_COUNT=0 + +pass() { echo -e " ${GREEN}✓${RESET} $1"; ((PASS_COUNT++)); } +fail() { echo -e " ${RED}✗${RESET} $1"; ((FAIL_COUNT++)); } +warn() { echo -e " ${AMBER}⚠${RESET} $1"; ((WARN_COUNT++)); } + +# Extract a number from a pattern in a file. Usage: extract_num "file" "sed-pattern" +extract_num() { + local result + result=$(sed -n "$2" "$1" 2>/dev/null | head -1) + echo "${result:-?}" +} + +echo "" +echo -e "${AMBER}Zero Operators — Documentation Consistency Validator${RESET}" +echo -e "${DIM}─────────────────────────────────────────────────────${RESET}" +echo "" + +# ───────────────────────────────────────────────────── +# Check 1: Agent file count vs documented counts +# ───────────────────────────────────────────────────── +echo -e "${DIM}Check 1: Agent file count...${RESET}" +ACTUAL_AGENTS=$(find .claude/agents -maxdepth 1 -name "*.md" -type f | wc -l | tr -d ' ') + +# README badge: agents-17_defined +README_AGENTS=$(grep -oE 'agents-[0-9]+' README.md | head -1 | grep -oE '[0-9]+') +README_AGENTS="${README_AGENTS:-?}" +if [[ "$README_AGENTS" == "$ACTUAL_AGENTS" ]]; then + pass "README.md badge: ${ACTUAL_AGENTS} agents" +else + fail "README.md badge says ${README_AGENTS}, actual is ${ACTUAL_AGENTS}" +fi + +# setup.sh hardcoded count: AGENT_COUNT -eq 17 +SETUP_COUNT=$(grep -oE 'AGENT_COUNT -eq [0-9]+' setup.sh | head -1 | grep -oE '[0-9]+') +SETUP_COUNT="${SETUP_COUNT:-?}" +if [[ "$SETUP_COUNT" == "$ACTUAL_AGENTS" ]]; then + pass "setup.sh count: ${ACTUAL_AGENTS}" +else + fail "setup.sh expects ${SETUP_COUNT}, actual is ${ACTUAL_AGENTS}" +fi + +# lead-orchestrator.md: "17 pre-defined agents" +LO_COUNT=$(grep -oE '[0-9]+ pre-defined agents' .claude/agents/lead-orchestrator.md | head -1 | grep -oE '[0-9]+') +LO_COUNT="${LO_COUNT:-?}" +if [[ "$LO_COUNT" == "$ACTUAL_AGENTS" ]]; then + pass "lead-orchestrator.md: ${ACTUAL_AGENTS} pre-defined" +else + fail "lead-orchestrator.md says ${LO_COUNT} pre-defined, actual is ${ACTUAL_AGENTS}" +fi + +echo "" + +# ───────────────────────────────────────────────────── +# Check 2: Agent name registry (setup.sh vs files) +# ───────────────────────────────────────────────────── +echo -e "${DIM}Check 2: Agent name registry...${RESET}" + +# Extract agent names from files +ACTUAL_NAMES=$(find .claude/agents -maxdepth 1 -name "*.md" -type f -exec basename {} .md \; | sort) + +# Extract names from setup.sh EXPECTED_AGENTS array +# Grab lines between ( and ), extract hyphenated words +SETUP_NAMES=$(sed -n '/EXPECTED_AGENTS=(/,/)/p' setup.sh | grep -oE '[a-z][-a-z]*' | sort) + +# Compare both directions +MISSING_FROM_SETUP=$(comm -23 <(echo "$ACTUAL_NAMES") <(echo "$SETUP_NAMES")) +MISSING_FROM_DIR=$(comm -13 <(echo "$ACTUAL_NAMES") <(echo "$SETUP_NAMES")) + +if [[ -z "$MISSING_FROM_SETUP" && -z "$MISSING_FROM_DIR" ]]; then + pass "All agent files match setup.sh EXPECTED_AGENTS" +else + if [[ -n "$MISSING_FROM_SETUP" ]]; then + fail "Agents in .claude/agents/ but NOT in setup.sh: $(echo $MISSING_FROM_SETUP | tr '\n' ', ')" + fi + if [[ -n "$MISSING_FROM_DIR" ]]; then + fail "Agents in setup.sh but NO .md file: $(echo $MISSING_FROM_DIR | tr '\n' ', ')" + fi +fi + +echo "" + +# ───────────────────────────────────────────────────── +# Check 3: Command file count vs documented counts +# ───────────────────────────────────────────────────── +echo -e "${DIM}Check 3: Command file count...${RESET}" +ACTUAL_COMMANDS=$(find .claude/commands -name "*.md" -type f | wc -l | tr -d ' ') + +# README: "24 slash commands" +README_COMMANDS=$(grep -oE '[0-9]+ slash commands' README.md | head -1 | grep -oE '[0-9]+') +README_COMMANDS="${README_COMMANDS:-?}" +if [[ "$README_COMMANDS" == "$ACTUAL_COMMANDS" ]]; then + pass "README.md: ${ACTUAL_COMMANDS} slash commands" +else + fail "README.md says ${README_COMMANDS} commands, actual is ${ACTUAL_COMMANDS}" +fi + +# STATE.md: "24 commands across" +STATE_COMMANDS=$(grep -oE '[0-9]+ commands across' memory/zo-platform/STATE.md | head -1 | grep -oE '[0-9]+') +STATE_COMMANDS="${STATE_COMMANDS:-?}" +if [[ "$STATE_COMMANDS" == "$ACTUAL_COMMANDS" ]]; then + pass "STATE.md: ${ACTUAL_COMMANDS} commands" +else + fail "STATE.md says ${STATE_COMMANDS} commands, actual is ${ACTUAL_COMMANDS}" +fi + +echo "" + +# ───────────────────────────────────────────────────── +# Check 4: Version consistency +# ───────────────────────────────────────────────────── +echo -e "${DIM}Check 4: Version consistency...${RESET}" +VER_TOML=$(sed -n 's/^version = "\(.*\)"/\1/p' pyproject.toml | head -1) +VER_INIT=$(sed -n 's/^__version__ = "\(.*\)"/\1/p' src/zo/__init__.py | head -1) +VER_CLI=$(sed -n 's/^_VERSION = "\(.*\)"/\1/p' src/zo/cli.py | head -1) +VER_TOML="${VER_TOML:-?}" +VER_INIT="${VER_INIT:-?}" +VER_CLI="${VER_CLI:-?}" + +if [[ "$VER_TOML" == "$VER_INIT" && "$VER_INIT" == "$VER_CLI" ]]; then + pass "Version ${VER_TOML} consistent across pyproject.toml, __init__.py, cli.py" +else + fail "Version mismatch: pyproject.toml=${VER_TOML}, __init__.py=${VER_INIT}, cli.py=${VER_CLI}" +fi + +echo "" + +# ───────────────────────────────────────────────────── +# Check 5: Model tier consistency (warn-only) +# ───────────────────────────────────────────────────── +echo -e "${DIM}Check 5: Model tier consistency...${RESET}" +TIER_ISSUES=0 +for agent_file in .claude/agents/*.md; do + agent_name=$(basename "$agent_file" .md) + # Extract model from YAML frontmatter + model_id=$(sed -n '/^---$/,/^---$/{ /^model:/s/model: *//p; }' "$agent_file") + + # Map model ID to tier name + case "$model_id" in + *opus*) expected_tier="Opus" ;; + *sonnet*) expected_tier="Sonnet" ;; + *haiku*) expected_tier="Haiku" ;; + *) expected_tier="Unknown" ;; + esac + + # Check specs/agents.md for this agent heading + # Convert hyphens to spaces for heading match (e.g., "model-builder" -> "model builder") + search_name="${agent_name//-/ }" + spec_tier=$(grep -i -A 2 "### .*${search_name}" specs/agents.md 2>/dev/null | grep -i "model tier" | head -1 || true) + + if [[ -n "$spec_tier" ]]; then + if ! echo "$spec_tier" | grep -qi "$expected_tier"; then + warn "${agent_name}: agent file says ${expected_tier}, specs/agents.md says: $(echo "$spec_tier" | sed 's/.*: *//')" + ((TIER_ISSUES++)) + fi + fi +done + +if [[ $TIER_ISSUES -eq 0 ]]; then + pass "All agent tiers match between definitions and specs" +fi + +echo "" + +# ───────────────────────────────────────────────────── +# Check 6: Test count badge (warn-only) +# ───────────────────────────────────────────────────── +echo -e "${DIM}Check 6: Test count badge...${RESET}" +README_TESTS=$(grep -oE 'tests-[0-9]+' README.md | head -1 | grep -oE '[0-9]+') +README_TESTS="${README_TESTS:-?}" + +if [[ "$README_TESTS" == "?" ]]; then + warn "Could not parse test badge from README.md" +else + # Count test functions via grep (fast, no pytest dependency) + ACTUAL_TESTS=$(grep -r "def test_" tests/ 2>/dev/null | wc -l | tr -d ' ') + DIFF=$((ACTUAL_TESTS - README_TESTS)) + ABS_DIFF=${DIFF#-} + if [[ $ABS_DIFF -le 5 ]]; then + pass "Test badge: ${README_TESTS} (grep count: ${ACTUAL_TESTS}, within tolerance)" + else + warn "Test badge says ${README_TESTS}, grep finds ${ACTUAL_TESTS} test functions (diff: ${DIFF})" + fi +fi + +echo "" + +# ───────────────────────────────────────────────────── +# Check 7: setup.sh pass message count +# ───────────────────────────────────────────────────── +echo -e "${DIM}Check 7: setup.sh agent count literal...${RESET}" +SETUP_PASS_MSG=$(grep -oE 'All [0-9]+ agent' setup.sh | head -1 | grep -oE '[0-9]+') +SETUP_PASS_MSG="${SETUP_PASS_MSG:-?}" +if [[ "$SETUP_PASS_MSG" == "$ACTUAL_AGENTS" ]]; then + pass "setup.sh pass message matches: ${ACTUAL_AGENTS}" +else + fail "setup.sh pass message says ${SETUP_PASS_MSG}, actual is ${ACTUAL_AGENTS}" +fi + +echo "" + +# ───────────────────────────────────────────────────── +# Summary +# ───────────────────────────────────────────────────── +echo -e "${DIM}─────────────────────────────────────────────────────${RESET}" +TOTAL=$((PASS_COUNT + FAIL_COUNT + WARN_COUNT)) +echo -e " ${GREEN}${PASS_COUNT} passed${RESET} ${RED}${FAIL_COUNT} failed${RESET} ${AMBER}${WARN_COUNT} warnings${RESET} (${TOTAL} checks)" +echo "" + +if [[ $FAIL_COUNT -gt 0 ]]; then + echo -e "${RED}Documentation is out of sync with codebase. Fix before committing.${RESET}" + echo "" + exit 1 +else + echo -e "${GREEN}Documentation is consistent with codebase.${RESET}" + echo "" + exit 0 +fi diff --git a/setup.sh b/setup.sh index 9ea61d6..72ef03d 100755 --- a/setup.sh +++ b/setup.sh @@ -79,7 +79,7 @@ fi # 6. Agent definition files echo -e "${DIM}Checking agent definitions...${RESET}" EXPECTED_AGENTS=( - lead-orchestrator data-engineer model-builder oracle-qa + lead-orchestrator research-scout data-engineer model-builder oracle-qa code-reviewer test-engineer xai-agent domain-evaluator ml-engineer infra-engineer software-architect backend-engineer frontend-engineer platform-test-engineer platform-code-reviewer @@ -95,10 +95,10 @@ for agent in "${EXPECTED_AGENTS[@]}"; do fi done -if [[ $AGENT_COUNT -eq 16 ]]; then - pass "All 16 agent definitions present" +if [[ $AGENT_COUNT -eq 17 ]]; then + pass "All 17 agent definitions present" else - fail "$AGENT_COUNT/16 agent definitions found" + fail "$AGENT_COUNT/17 agent definitions found" for missing in "${MISSING_AGENTS[@]}"; do echo -e " ${RED}missing:${RESET} .claude/agents/${missing}.md" done diff --git a/specs/agents.md b/specs/agents.md index 5e2a2d6..e47e8ad 100644 --- a/specs/agents.md +++ b/specs/agents.md @@ -98,7 +98,7 @@ ZO operates two distinct team configurations: a **Project Delivery Team** that e ### 3. Model Builder -**Model tier**: Sonnet / Opus (combined researcher + engineer for v1) +**Model tier**: Opus **Role**: Selects architecture, trains models, iterates against Oracle feedback, handles regime segmentation and GPU optimization. **Ownership**: @@ -249,9 +249,44 @@ ZO operates two distinct team configurations: a **Project Delivery Team** that e --- +### 7. Research Scout + +**Model tier**: Opus +**Role**: Searches literature, identifies SOTA approaches, finds open-source implementations, and designs experiment plans with informed baselines. + +**Ownership**: +- `research/` (literature reviews, SOTA summaries, open-source catalogs, experiment plans) +- `research/references.bib` (BibTeX references) + +**Off-limits**: +- `data/` (Data Engineer's responsibility) +- `models/` (Model Builder's responsibility) +- `oracle/` (Oracle/QA's responsibility) +- `plan.md`, `STATE.md`, `DECISION_LOG.md` (Orchestrator's responsibility) + +**Key outputs**: +- Literature review with 3+ relevant approaches and citations +- SOTA summary with best known results (or analogous ranges) +- Open-source implementation catalog with licenses +- Experiment plan with baselines, candidates, and oracle threshold recommendations + +**Communication rules**: +- Runs first: completes Phase 0 before Model Builder starts Phase 3 +- Informs oracle thresholds with literature-backed recommendations +- Hands off open-source code references to Model Builder +- Updates research if Phase 4 experiments reveal dead ends + +**Validation checklist**: +- Literature review covers at least 3 relevant approaches +- Experiment plan has at least 2 baselines and 1-2 candidates +- Oracle threshold recommendations justified by evidence +- All artifacts in `research/` — no off-limits files modified + +--- + ## Project Delivery Team — Phase-In Agents (Deployed When Core Loop Is Proven) -### 7. XAI Agent +### 8. XAI Agent **Model tier**: Sonnet **Role**: Analyzes model explainability—SHAP values, attention patterns, feature importance—validates interpretability against domain assumptions. @@ -267,7 +302,7 @@ ZO operates two distinct team configurations: a **Project Delivery Team** that e --- -### 8. Domain Evaluator +### 9. Domain Evaluator **Model tier**: Opus **Role**: Performs domain-specific validation—physical plausibility, logical consistency, regulatory compliance—independent of primary metrics. @@ -282,7 +317,7 @@ ZO operates two distinct team configurations: a **Project Delivery Team** that e --- -### 9. ML Engineer +### 10. ML Engineer **Model tier**: Sonnet **Role**: Optimizes inference latency, GPU memory, batch throughput; maintains experiment tracking infrastructure; refines reproducibility. @@ -298,7 +333,7 @@ ZO operates two distinct team configurations: a **Project Delivery Team** that e --- -### 10. Infra Engineer +### 11. Infra Engineer **Model tier**: Haiku **Role**: Sets up environments, manages dependencies, schedules data pipelines, provisions deployment infrastructure. @@ -458,7 +493,7 @@ The platform build team is used to build and maintain ZO itself as software. It ### B2. Backend Engineer -**Model tier**: Sonnet / Opus +**Model tier**: Opus **Team**: platform **Role**: Implements core ZO infrastructure modules in Python. May be multiple instances for parallel module development. @@ -553,7 +588,7 @@ The platform build team is used to build and maintain ZO itself as software. It ### Project Delivery Team Before session 1 launch, verify: -- [ ] All 6 launch agents have defined `.md` files with YAML frontmatter +- [ ] All 7 launch agents have defined `.md` files with YAML frontmatter - [ ] Contracts between launch agents are documented and signed off - [ ] Data Engineer and Oracle test data split is locked - [ ] Orchestrator's plan.md reflects current phase and passes validation diff --git a/src/zo/__init__.py b/src/zo/__init__.py index 71fa7f7..9ce0e8f 100644 --- a/src/zo/__init__.py +++ b/src/zo/__init__.py @@ -3,4 +3,4 @@ You input a plan. Agents execute. The oracle verifies. """ -__version__ = "1.0.0" +__version__ = "1.0.1"