feat: ZO e2e validated — wrapper fix, README, MNIST project complete by SamPlvs · Pull Request #7 · SamPlvs/zero-operators

SamPlvs · 2026-04-09T21:01:17Z

Summary

Wrapper Fix (critical)

--cwd → --add-dir (correct claude CLI flag for delivery repo access)
--teammate-mode tmux removed (doesn't exist; teams created internally via TeamCreate)
Added --dangerously-skip-permissions for non-interactive agent execution
This fix enabled the first successful live run of ZO

README Rewrite

User workflow diagram (ASCII art showing full flow)
All 6 CLI commands with usage examples
Gate modes explained (supervised/auto/full-auto)
ML workflow phases for all 3 modes
4-layer architecture diagram
Self-evolution flow diagram
Quick start guide

MNIST End-to-End Validation

ZO autonomously built a complete MNIST digit classifier:

Phase	Result
Phase 1: Data Pipeline	70k samples, EDA, DataLoaders, 32 tests
Phase 3: Model Architecture	CNN (2 conv + BN + 2 FC), 51 tests
Phase 4: Training	99.00% test accuracy (target: 95%)
Phase 5: Analysis	GradCAM, ablation, significance, reproducibility

Delivery repo (mnist-delivery/):

Clean code, trained model, inference script, oracle evaluation
XAI: GradCAM visualizations, saliency maps, error analysis
Experiments: ablation study, 3-seed significance testing
98 tests passing, zero ZO artifacts
4 clean git commits (one per phase)
Total cost: ~$11

Test plan

296 ZO platform tests passing
ruff clean
zo init mnist-digit-classifier scaffolds correctly
zo build plans/mnist-digit-classifier.md executes successfully
Agent team produces working code in delivery repo
99.00% accuracy exceeds 95% oracle threshold
Zero ZO artifacts in delivery repo
Session logs captured in logs/comms/ and logs/wrapper/

🤖 Generated with Claude Code

- --cwd is not a valid claude CLI flag → use --add-dir for delivery repo access - --teammate-mode tmux doesn't exist → removed (teams created internally via TeamCreate) - Added --dangerously-skip-permissions for non-interactive agent execution - Updated target file to absolute path for worktree compatibility - MNIST project initialized with memory scaffold First successful live run: ZO executed Phase 1 of MNIST project. Agent team produced data_loader.py, 32 passing tests, EDA report, data quality report in the delivery repo. Zero ZO artifacts leaked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

First successful end-to-end run of Zero Operators on a real project. MNIST digit classifier built autonomously across 5 phases: - Phase 1: Data pipeline, EDA, DataLoaders (32 tests) - Phase 3: CNN architecture (2 conv + BN + 2 FC), training loop (51 tests) - Phase 4: Training to 99.00% test accuracy (Tier 1 = 95%), oracle passed - Phase 5: GradCAM, ablation, significance testing, reproducibility - 98 tests passing in delivery repo, lint clean Delivery repo (mnist-delivery/) contains: - src/model.py, train.py, inference.py, data_loader.py - models/best_model.pt (trained checkpoint) - oracle/eval.py + confusion matrix + evaluation report - xai/gradcam.py + error analysis + saliency/GradCAM plots - experiments/ablation, significance testing, reproducibility - 8 test files, pyproject.toml, clean git history (4 commits) - Zero ZO artifacts — clean delivery Total cost: ~$11 across all sessions. Session logs preserved in logs/comms/ and logs/wrapper/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add automated documentation consistency validation with enforcement hooks. When the 17th agent (Research Scout) was added, 10+ files had stale counts, version numbers, and model tiers. Root cause: CLAUDE.md cascade protocol existed as text but had zero enforcement. Layer 1 — scripts/validate-docs.sh: 7 programmatic checks (agent count, names, commands, version, tiers, tests, setup.sh literal). Runs in <2s. Exits non-zero on failure. Layer 2 — Claude Code hooks (.claude/settings.json): PreToolUse on git commit: blocks if validation fails PostToolUse on Write|Edit: cascade reminders for trigger files Stop: checks for uncommitted changes in trigger paths Layer 3 — Documentation: CLAUDE.md cascade protocol replaced with file-to-file mappings PR-005 added to PRIORS.md (missing_rule → enforcement) commit command updated with validation step Also fixes: - Agent count 16→17 across 10 files (Research Scout was undocumented) - Version 1.0.0→1.0.1 in pyproject.toml and __init__.py - Test badge 295→298, setup checks 11→10 - Command count 23→24 in STATE.md - Model Builder and Backend Engineer tiers: "Sonnet/Opus"→"Opus" - Research Scout added to specs/agents.md roster (#7), phase-in renumbered - PRD.md: 6 launch→7 launch, 10→11 project delivery agents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: ZO e2e validated — wrapper fix, README, MNIST project complete

Add automated documentation consistency validation with enforcement hooks. When the 17th agent (Research Scout) was added, 10+ files had stale counts, version numbers, and model tiers. Root cause: CLAUDE.md cascade protocol existed as text but had zero enforcement. Layer 1 — scripts/validate-docs.sh: 7 programmatic checks (agent count, names, commands, version, tiers, tests, setup.sh literal). Runs in <2s. Exits non-zero on failure. Layer 2 — Claude Code hooks (.claude/settings.json): PreToolUse on git commit: blocks if validation fails PostToolUse on Write|Edit: cascade reminders for trigger files Stop: checks for uncommitted changes in trigger paths Layer 3 — Documentation: CLAUDE.md cascade protocol replaced with file-to-file mappings PR-005 added to PRIORS.md (missing_rule → enforcement) commit command updated with validation step Also fixes: - Agent count 16→17 across 10 files (Research Scout was undocumented) - Version 1.0.0→1.0.1 in pyproject.toml and __init__.py - Test badge 295→298, setup checks 11→10 - Command count 23→24 in STATE.md - Model Builder and Backend Engineer tiers: "Sonnet/Opus"→"Opus" - Research Scout added to specs/agents.md roster (#7), phase-in renumbered - PRD.md: 6 launch→7 launch, 10→11 project delivery agents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SamPlvs and others added 2 commits April 9, 2026 19:51

SamPlvs merged commit 1e0d9bb into main Apr 9, 2026

SamPlvs deleted the claude/strange-swartz branch April 9, 2026 21:03

SamPlvs mentioned this pull request Apr 10, 2026

feat(evolution): three-layer defense against doc-codebase drift #16

Merged

SamPlvs mentioned this pull request Apr 20, 2026

v1.x polish: phase snapshots, generic domain-evaluator, denylist-first #48

Merged

5 tasks

SamPlvs added a commit that referenced this pull request Apr 30, 2026

Merge pull request #7 from SamPlvs/claude/strange-swartz

888b65e

feat: ZO e2e validated — wrapper fix, README, MNIST project complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ZO e2e validated — wrapper fix, README, MNIST project complete#7

feat: ZO e2e validated — wrapper fix, README, MNIST project complete#7
SamPlvs merged 2 commits into
mainfrom
claude/strange-swartz

SamPlvs commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SamPlvs commented Apr 9, 2026

Summary

Wrapper Fix (critical)

README Rewrite

MNIST End-to-End Validation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant