Zero Operators (ZO) is an autonomous AI research and engineering team. You give it a project — a GitHub repo, some source documents, and success criteria — and it builds, trains, validates, and delivers. A coordinated team of AI agents handles the full ML lifecycle: data engineering, model building, oracle validation, code review, testing, and explainability.
You stay in the loop at human checkpoints. ZO remembers everything across sessions. It learns from its mistakes. And the delivery repo stays clean — zero ZO artifacts leak into your project.
You ZO Delivery Repo
─── ── ─────────────
1. Draft a plan ───────────► zo draft ──► plan.md
│
2. Review & edit plan ◄─────────────────────┘
│
3. Launch ─────────────────► zo build plans/project.md
│
┌──────────────┘
│
Orchestrator
decomposes plan
into phases
│
▼
┌─── Agent Team (tmux) ───┐
│ │
│ Data Engineer │
│ Model Builder │ ──────► src/
│ Oracle / QA │ ──────► models/
│ Code Reviewer │ ──────► reports/
│ Test Engineer │ ──────► tests/
│ │
│ Peer-to-peer comms │
│ via SendMessage │
└─────────┬───────────────┘
│
4. Review at gates ◄────────┤ (supervised mode)
Approve / iterate │
│
5. Session ends ───► STATE.md + DECISION_LOG + PRIORS
│
6. Resume anytime ─► zo continue project
│
7. Delivery ◄───────────────┘ Clean repo, zero ZO artifacts
Step by step:
- Draft a plan —
zo draft --project my-projectopens an interactive Claude session that drafts aplan.mdconversationally. Optionally provide source docs or a description (-d) - Review the plan — edit
plans/my-project.mdto sharpen the objective, set oracle thresholds, add domain knowledge - Launch —
zo build plans/my-project.mdshows a phase review (subtasks, agents, oracle criteria), prompts for additional instructions, then spawns the agent team in tmux - Approve at gates — in supervised mode (default), every phase transition pauses for your review. You can also type directly into the Lead Orchestrator's Claude Code session
- Session continuity — stop anytime. Run
zo buildagain — it auto-detects the current phase and resumes. Or usezo continue my-projectas shorthand - Self-evolution — when something fails, ZO runs a post-mortem: fix the symptom, update the rule that allowed it, verify the rule prevents recurrence
- Clean delivery — your project repo contains only code, models, reports, and tests. Zero ZO infrastructure
zo build plans/my-project.md --gate-mode supervisedSmart mode detection:
- Fresh project (no state) — builds from scratch
- Existing state — continues from the current phase
- Plan edited since last run — re-decomposes and resumes
Shows a brand panel, phase review with subtasks/agents/oracle criteria, and prompts for additional instructions before launching.
zo continue my-projectFinds plans/{project}.md and runs zo build on it. Shorthand for when you don't want to type the plan path.
zo draft -p my-project --docs ~/docs/ --data ~/data/ # docs + data inspection
zo draft -p cifar10 -d "CIFAR-10 CNN, PyTorch, 90%+" # from description
zo draft -p my-project # fully conversationalLaunches a Plan Architect (Opus) that drafts plan.md conversationally with you. Optionally spawns Data Scout (inspects --data paths for schema, distributions, quality flags) and Research Scout (finds prior art and baselines) in the background. Scout findings are woven into the plan as they arrive. All args are optional — if nothing provided, the architect asks you everything conversationally.
The Plan Architect also populates two Agent Configuration knobs based on scout findings: Custom agents (new specialist roles the project needs — signal processing, calibration, NLP specialists, etc. — auto-created as .claude/agents/custom/*.md at build start) and Agent adaptations (domain-specific prompt additions for existing agents like xai-agent and domain-evaluator, appended at spawn time so the agent's .md file stays reusable across projects). Custom agents extend the team; adaptations tailor existing members. Both flow through to the build automatically.
zo init my-project # conversational (default)
zo init my-project --no-tmux --branch main --scaffold-delivery /path/to/delivery-repo
zo init ivl-f5 --no-tmux --branch samtukra --existing-repo ~/code/ivl-f5 --layout-mode adaptiveDefault behaviour launches the Init Architect (Opus) in a tmux pane. The agent interviews you (new vs existing repo, branch, training host, data location, layout mode), inspects the target repo, runs --dry-run to preview the file tree, and only commits writes after you confirm. For CI/scripts, pass --no-tmux plus the flags you need.
Creates: memory/{project}/, targets/{project}.target.md, plans/{project}.md (with auto-populated ## Environment section), and a delivery repo scaffold. With --existing-repo, adds only ZO infrastructure dirs (configs/, experiments/, docker/) without touching existing code. With --layout-mode=adaptive, preserves your code layout entirely. If you need to start over, zo init {project} --reset deletes the ZO artifacts (memory, target, plan) without ever touching the delivery repo. See Delivery Repo Structure and docs/COMMANDS.md for the full flag surface.
zo preflight plans/my-project.md --target-repo /path/to/deliveryRuns local-only validation: Claude CLI, tmux, plan parsing, agent definitions, memory round-trip, Docker, GPU availability. Fix failures before running zo build.
zo status my-projectDisplays the current STATE.md: active phase, blockers, next steps, agent statuses.
zo watch-training --project my-projectPersistent Rich panel showing epoch progress, metrics table (current/best/target), loss sparkline, and checkpoint history. Auto-launched by zo build during Phase 4 via tmux split-pane — no window switching needed. Training scripts emit metrics via ZOTrainingCallback from zo.training_metrics.
Control how much autonomy ZO has at phase transitions.
| Mode | Flag | Behaviour |
|---|---|---|
| Supervised (default) | --gate-mode supervised |
Every phase gate pauses for your approval. You review metrics, decisions, and artifacts before proceeding. |
| Auto | --gate-mode auto |
Only gates marked BLOCKING in the plan require approval. Automated gates proceed if all subtasks pass. |
| Full Auto | --gate-mode full-auto |
No human gates. ZO runs start to finish autonomously. Use when you trust the pipeline. |
You can switch modes at runtime — start supervised, watch the first few phases, then switch to auto once you trust the flow.
zo gates set auto --project my-project
zo gates set full-auto -p my-projectWrites the new mode to memory/{project}/gate_mode. The running orchestrator and wrapper pick it up on the next poll cycle — no restart needed.
During zo build, the main terminal shows a live activity feed: tasks, agent progress, comms events. Every 60 seconds, Claude Haiku generates a 1-line headline summarising recent activity — like news ticker for your build.
# 1. Clone and setup
git clone https://github.com/SamPlvs/zero-operators.git
cd zero-operators
./setup.sh # validates deps, auto-fixes missing ones interactively
# 2. Install
uv sync --extra dev
# 3. Initialize a project
zo init my-project
# 4. Option A: Draft a plan from source documents
zo draft ~/docs/requirements.md ~/data/ --project my-project
# 4. Option B: Write a plan manually
# Edit plans/my-project.md — fill in all 8 sections
# 5. Start tmux (required for agent visibility)
tmux new -s zo
# 6. Launch — you'll see a phase review, then agents in tmux panes
zo build plans/my-project.md
# 7. Navigate tmux panes
# Ctrl-b n → switch to agent window
# Ctrl-b p → back to monitoring window
# Ctrl-b q N → jump to pane N
# Ctrl-b z → zoom current pane
# 8. Approve at human checkpoints (supervised mode)
# 9. Resume if interrupted
zo continue my-project
# 10. Check status anytime
zo status my-projectStarting a new Claude Code session? Use /zo-dev to get full context:
/zo-dev
This loads STATE.md, DECISION_LOG, PRIORS, presents a briefing of where you are, and asks what to work on. No need to explain context manually — ZO remembers everything.
Other session commands:
/memory/prime zo-platform— detailed context briefing with semantic search/memory/recall "query"— search past decisions for a specific topic/memory/session-summary— wrap up the current session cleanly
ZO provides 24 slash commands for Claude Code. See docs/COMMANDS.md for the full reference.
| Category | Key Commands |
|---|---|
| Platform | /zo-dev |
| Project | /project/import, /project/connect, /project/plan, /project/launch |
| Memory | /memory/recall, /memory/prime, /memory/priors, /memory/session-summary |
| Gates | /gates/approve, /gates/reject, /gates/gates |
| Observe | /observe/watch, /observe/logs, /observe/decisions, /observe/history |
| Document | /document/code-docs, /document/model-card, /document/retrospective, /document/validation-report |
| Agents | /agents/agents, /agents/spawn, /agents/create-agent |
| Utility | /commit |
ZO follows a structured pipeline defined in specs/workflow.md. Three modes available:
Phase 1: Data Review & Pipeline → Gate (automated)
13 subtasks covering schema validation, outlier detection, class imbalance, split strategy, and more
Phase 2: Feature Engineering → Gate (BLOCKING — human approves features)
Feature creation, statistical filtering, multicollinearity pruning
Phase 3: Model Design → Gate (automated)
Architecture selection, loss design, training strategy, oracle setup
Phase 4: Training & Iteration → Gate (automated — oracle loop)
Baseline training, iteration protocol, cross-validation, ensemble
Phase 5: Analysis & Validation → Gate (BLOCKING — human approves model)
SHAP/explainability, domain consistency, error analysis, significance testing
Phase 6: Packaging → Gate (automated)
Inference pipeline, model card, validation report, drift detection, test suite
Same phases but: Phase 2 focuses on input representation and transfer learning. Phase 3 adds architecture search and gradient diagnostics. Phase 4 adds training diagnostics.
Adds Phase 0: Literature Review (prior art survey, baseline definition). Phase 5 expands with ablation studies and reproducibility verification. Phase 6 adds paper-ready figures.
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Python CLI │
│ │
│ zo build ──► plan.py ──► orchestrator.py ──► wrapper.py │
│ parse & decompose phases launch ONE │
│ validate build lead prompt claude session│
│ plan.md generate contracts │
│ │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: Claude Code Session │
│ │
│ Lead Orchestrator (native agent team) │
│ ├── TeamCreate("project") │
│ ├── Agent(name="data-engineer", team_name="project") │
│ ├── Agent(name="model-builder", team_name="project") │
│ ├── Agent(name="oracle-qa", team_name="project") │
│ └── Agents communicate peer-to-peer via SendMessage │
│ │
│ The Lead knows all 20 agents and creates new ones on the │
│ fly if the project needs expertise not in the roster. │
│ │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: Persistence │
│ │
│ memory.py ──► STATE.md (session checkpoint) │
│ DECISION_LOG.md (audit trail) │
│ PRIORS.md (domain knowledge) │
│ semantic.py ► index.db (decision search) │
│ comms.py ───► YYYY-MM-DD.jsonl (structured event logs) │
│ evolution.py ► post-mortem → rule updates → verification │
│ │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: Delivery Repo (clean) │
│ │
│ src/ models/ reports/ tests/ — zero ZO artifacts │
│ Isolation enforced via target.py zo_only_paths blocklist │
│ │
└─────────────────────────────────────────────────────────────┘
Project Delivery Team — 11 agents that execute ML/research projects:
| Agent | Model | When Active | What They Do |
|---|---|---|---|
| Lead Orchestrator | Opus | Always | Creates team, decomposes phases, manages gates, coordinates |
| Research Scout | Opus | All phases | Literature survey, SOTA, open-source code, experiment plan |
| Data Engineer | Sonnet | Phases 1-2 | Data pipeline, cleaning, EDA, DataLoaders |
| Model Builder | Opus | Phases 3-5 | Architecture selection, training, iteration |
| Oracle / QA | Sonnet | Phases 3-5 | Hard metric evaluation, pass/fail gating |
| Code Reviewer | Sonnet | All phases | Code quality, PEP8, security, conventions |
| Test Engineer | Sonnet | All phases | Unit, integration, regression tests |
| XAI Agent | Sonnet | Phase 5 | SHAP, feature importance, explainability |
| Domain Evaluator | Opus | Phase 5 | Domain validation, plausibility checks |
| ML Engineer | Sonnet | Phases 4-6 | Inference optimization, experiment tracking |
| Infra Engineer | Haiku | Phases 1, 6 | Environment setup, packaging, deployment |
| Plan Architect | Opus | zo draft | Leads plan drafting, spawns scouts, converses with human |
| Data Scout | Sonnet | zo draft | Quick data inspection — schema, distributions, quality flags |
Code Reviewer and Research Scout are cross-cutting agents present in all phases by default.
Dynamic agents — if your project needs expertise not covered (NLP, time-series, security), the Lead Orchestrator creates a new agent definition on the fly.
When something fails, ZO doesn't just fix the symptom:
Error detected
│
▼
Step 1: Document failure ──► DECISION_LOG
Step 2: Root cause analysis ──► missing_rule? incomplete_rule? regression?
Step 3: Fix the immediate problem
Step 4: Update the rule ──► PRIORS.md / spec file / agent definition
Step 5: Verify the update would have caught the original failure
Over time, PRIORS.md accumulates domain knowledge. The same mistake never happens twice.
zero-operators/
├── src/zo/ # Platform code (10 modules)
│ ├── cli.py # CLI: zo build/continue/init/status/draft/gates
│ ├── draft.py # Conversational plan generation (with or without source docs)
│ ├── plan.py # Plan parser and validator (8 sections)
│ ├── target.py # Target file parser, isolation enforcer
│ ├── orchestrator.py # Phase decomposition, gate management, lead prompt
│ ├── wrapper.py # Claude CLI launcher + team observer
│ ├── memory.py # STATE.md, DECISION_LOG, PRIORS, sessions
│ ├── semantic.py # fastembed + SQLite semantic search
│ ├── comms.py # JSONL event logger (5 event types)
│ └── evolution.py # Self-evolving post-mortem protocol
├── .claude/agents/ # 20 agent definitions
├── specs/ # 8 specification documents
├── plans/ # Project plan files
├── memory/ # Per-project state (STATE.md, DECISION_LOG, PRIORS)
├── logs/ # JSONL audit trails
├── targets/ # Delivery repo configuration
├── tests/ # 415 tests (unit + integration)
├── setup.sh # Environment validation (10 checks)
└── pyproject.toml # Python package config
ZO has been validated end-to-end with an MNIST digit classification project.
The agent team autonomously:
- Built a data pipeline with DataLoaders and 32 data tests
- Designed a CNN (2 conv + BN + 2 FC layers)
- Trained to 99.00% test accuracy (oracle threshold: 95%)
- Produced GradCAM visualizations, ablation study, significance testing
- Delivered 98 passing tests in the clean delivery repo
- Zero ZO artifacts leaked — 4 clean git commits
Total cost: ~$11 across all sessions.
mnist-delivery/ ← delivery repo (clean)
├── src/
│ ├── model.py ← CNN architecture
│ ├── train.py ← training loop
│ ├── inference.py ← prediction pipeline
│ └── data_loader.py ← MNIST DataLoader
├── models/best_model.pt ← trained checkpoint (99% accuracy)
├── oracle/eval.py ← oracle evaluation script
├── xai/gradcam.py ← GradCAM visualizations
├── experiments/ ← ablation, significance, reproducibility
├── tests/ ← 98 tests passing
└── pyproject.toml
v1.0.1 — All phases complete. Validated end-to-end. Pre-IVL F5 hardening done.
| Phase | What | Status |
|---|---|---|
| 0 | Agent definitions (17) + Claude Code setup | Done |
| 1 | Plan parser, target parser, comms logger, setup | Done |
| 2 | Memory layer, semantic index | Done |
| 3 | Orchestration engine + lifecycle wrapper | Done |
| 4 | Evolution engine, CLI, integration tests | Done |
| 5 | E2E validation (MNIST: 99% accuracy) | Done |
| 1.0.1 | Interactive tmux, brand panel, smart build, Research Scout, self-evolution | Done |
| pre-F5 | Phase persistence, auto-notebooks, delivery scaffold + Docker, preflight | Done |
476 platform tests. ruff clean. 20 agents. 24 slash commands.