Skip to content

feat: complete v1 — platform-aware scaffold + MNIST/CIFAR-10 demos#54

Merged
SamPlvs merged 1 commit into
mainfrom
claude/focused-hugle-44c2f3
Apr 26, 2026
Merged

feat: complete v1 — platform-aware scaffold + MNIST/CIFAR-10 demos#54
SamPlvs merged 1 commit into
mainfrom
claude/focused-hugle-44c2f3

Conversation

@SamPlvs
Copy link
Copy Markdown
Owner

@SamPlvs SamPlvs commented Apr 26, 2026

Summary

  • v1 is complete. All 8 PRD §9 acceptance criteria met, all 6 STATE.md Known Issues closed.
  • Platform-aware Docker scaffoldscaffold_delivery(gpu_enabled=...) auto-detects host GPU and emits the correct compose template (CPU on macOS / no-GPU Linux, GPU otherwise). Fixes the broken deploy.resources.devices: capabilities: [gpu] block on Mac.
  • Two v1 demo deliverables — MNIST 99.66% test accuracy (Tier 3, 64s on MPS) and CIFAR-10 91.62% (Tier 3, 427s on MPS). Full Phase 1-6 deliverables in separate delivery repos.

What changed

Platform — src/zo/scaffold.py + src/zo/cli.py

Split _COMPOSE_TEMPLATE into GPU and CPU variants. New _resolve_compose_template() helper picks based on gpu_enabled: bool | None. CLI's _init_commit_writes probes via detect_environment() and passes through. Service name kept as gpu across both for cross-platform README parity.

Tests — tests/unit/test_scaffold.py + tests/unit/test_cli.py

6 new TestPlatformAwareCompose tests (explicit modes, auto-detect both ways, detection-failure fallback, CPU service-name parity). One existing test made host-independent.

Demos — plans/cifar10-classifier.md + delivery repos (separate)

CIFAR-10 plan added (public benchmark, no client data). Both demo delivery repos live at ~/Documents/code/personal/{mnist-digit-classifier,cifar10-classifier}-delivery/ — outside this repo, not part of this PR. Each has full Phase 1-6 artifacts (reports/{data_quality,training,analysis,model_card,validation}.md), trained weights in two formats × two locations (PyTorch + ONNX, redundancy against corruption), drift detection scaffold, and oracle + unit + export tests.

Memory

Docs

README test count 669 → 675, demos badge added, status section refreshed.

Verification

Check Result
ZO platform pytest 675 passed, 7 skipped
MNIST delivery pytest 16/16 passed
CIFAR-10 delivery pytest 19/19 passed
ruff check src/zo/ clean
validate-docs.sh 10/10, 0 failures
14 ZO CLI commands smoke-tested all working
zo preflight on both plans 6/7 PASS each (only nvidia-smi WARN)
PyTorch + ONNX exports load + round-trip SUCCESS for both demos
zo watch-training Rich Live dashboard renders against ZOTrainingCallback output

v1 acceptance criteria (PRD §9)

  • First project's oracle criteria fully satisfied — MNIST 99.66%
  • Delivery repo contains zero ZO infrastructure (only .zo/ portable state)
  • DECISION_LOG provides complete audit trail
  • Session recovery works
  • Self-evolving mechanism triggered ≥ 1 time — 34 priors documented
  • All 11+ project delivery agent definitions written (20 total)
  • All 6 platform build agent definitions written
  • Plan.md edit during execution succeeds

Test plan

  • PYTHONPATH=src pytest tests/ -q — 675 passed
  • ruff check src/zo/ — clean
  • bash scripts/validate-docs.sh — 10/10
  • Train MNIST end-to-end → 99.66%, 16 tests pass
  • Train CIFAR-10 end-to-end → 91.62%, 19 tests pass
  • Smoke-test all 14 ZO CLI commands
  • Verify PyTorch weights save + load + forward pass for both demos
  • Verify ONNX onnx.checker.check_model for both
  • Verify zo watch-training Rich Live dashboard renders
  • Confirm scaffold emits CPU compose on Mac (no broken capabilities: [gpu])

🤖 Generated with Claude Code

Close the three remaining Known Issues from STATE.md:

- Device detection (Mac vs Linux): scaffold now emits a CPU compose
  template on macOS / no-GPU Linux instead of a broken
  `deploy.resources.devices: capabilities: [gpu]` block. Auto-detects
  via `detect_environment()`, falls back to GPU template on probe
  failure (safest default for Linux build servers).
- MNIST Phase 6 packaging: full Phase 1-6 deliverables produced in
  `mnist-digit-classifier-delivery/` — 99.66% test accuracy, 16 tests
  (data + model + oracle + ONNX + PyTorch export round-trip).
- Plan Environment section: already shipped session-013, marked stale.

CIFAR-10 added as second-dataset platform validation: 91.62% test
accuracy, 19 tests, in `cifar10-classifier-delivery/`. Validates
the plan's documented domain priors about cat-dog being the
canonical confusion pair.

All 14 ZO CLI commands smoke-tested and working: --version, --help,
init (no-tmux + dry-run + reset), preflight, status, experiments
list/show/diff, gates set, migrate, watch-training (Rich Live
dashboard renders against ZOTrainingCallback output), build,
continue, draft. Both v1 demo plans pass `zo preflight` 6/7.

Two formats × two locations of trained weights for corruption
protection: experiments/exp-001/best.pt (full training checkpoint)
+ models/<proj>_cnn.pt (slim state_dict + config) + ONNX export
with sidecar `.onnx.data`.

Test count 669 → 675 (6 new TestPlatformAwareCompose tests).
ruff clean (src/zo/), validate-docs 10/10. v1 acceptance criteria
(PRD §9) all met. PR-033 (platform-aware templates) and PR-034
(MPS-pytest tensor extraction) added to PRIORS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SamPlvs SamPlvs merged commit f290040 into main Apr 26, 2026
1 check passed
@SamPlvs SamPlvs deleted the claude/focused-hugle-44c2f3 branch April 26, 2026 11:14
SamPlvs added a commit that referenced this pull request Apr 30, 2026
feat: complete v1 — platform-aware scaffold + MNIST/CIFAR-10 demos
SamPlvs added a commit that referenced this pull request Apr 30, 2026
First slice of the public docs site — 13 MDX pages organised into
Get-Started, Concepts, and CLI Reference sections. Brand v2 system
applied (coral primary, paper/canvas backgrounds, Geist font, simplified
C mark in logo + favicon).

Pages shipped:
- introduction.mdx — landing
- quickstart.mdx — 5-minute first run
- installation.mdx — prerequisites + setup.sh walkthrough
- concepts/overview.mdx — five primitives + design principles
- concepts/the-plan.mdx — 8 required sections, schema, edit-mid-project
- concepts/the-oracle.mdx — tiered metric, statistical significance, examples
- concepts/agents.mdx — 20 agents, two teams, tier routing, customisation
- concepts/phases-and-gates.mdx — 6 phases, 3 gate types, autonomous loop
- concepts/memory-and-continuity.mdx — STATE/DECISION_LOG/PRIORS, .zo/, semantic search
- cli/overview.mdx — pipeline, modes, all commands at a glance
- cli/init.mdx — full reference with layout modes + dry-run + reset
- cli/draft.mdx — Plan Architect + scouts, custom agents, adaptations
- cli/build.mdx — flagship command, smart mode detection, gate modes

Brand assets:
- docs/logo/light.svg (ink stroke + coral dot for paper bg)
- docs/logo/dark.svg (cream stroke + coral dot for canvas bg)
- docs/favicon.svg (coral simplified C, mid-tone for browser tabs)

mint.json brand config: primary #D87A57, light bg #F4EFE6 (paper),
dark bg #12110F (canvas), Geist headings + body, footer GitHub link,
search prompt, suggest-edit + raise-issue feedback.

Existing docs/*.md files (COMMANDS, SAMPLE_PROJECT, TROUBLESHOOTING,
DELIVERY_STRUCTURE) stay in place during transition — the README still
links to them. docs/README.md documents local dev (mintlify dev), the
file structure, brand tokens, deploy steps, and the migration table
from existing .md sources to MDX destinations.

Framework decision rationale (vs Astro Starlight) recorded in
DECISION_LOG: Mintlify wins on user adoption — pattern recognition
(Anthropic/Vercel/Linear/Cursor all use Mintlify), built-in AI search,
positioning signal, battle-tested mobile + edge-cases, lower
maintenance burden = fresher docs over time. Brand fidelity matters
most on the marketing site; docs prioritise speed-to-answer.

Deploy: connect Mintlify to the GitHub repo (one-click on mintlify.com,
free OSS plan), point at docs/, attach docs.zero-operators.dev custom
domain. Auto-deploys on push to main thereafter.

This commit updates PR #54 (v1 completion) — docs scaffold is part
of the v1 story.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant