Skip to content

chore(readme): remove MNIST/CIFAR benchmark-dataset framing#63

Merged
SamPlvs merged 2 commits into
mainfrom
claude/readme-remove-benchmark-framing
Apr 27, 2026
Merged

chore(readme): remove MNIST/CIFAR benchmark-dataset framing#63
SamPlvs merged 2 commits into
mainfrom
claude/readme-remove-benchmark-framing

Conversation

@SamPlvs
Copy link
Copy Markdown
Owner

@SamPlvs SamPlvs commented Apr 27, 2026

Summary

Toy benchmarks in the README signal "amateur ML demo" rather than "production-grade research orchestration". The substantive story is the platform's capabilities — autonomous agent teams, oracle-driven gating, contract-first spawning, hard `ZOTrainingCallback` enforcement, experiment capture + autonomous loop — not which canned dataset we happened to validate against.

Changes

Where Before After
Top badges MNIST 99.66% / CIFAR10 91.62% demo badge + tests-704_passing Demo badge removed; tests badge updated to 735_passing (current)
`zo draft` example `-p cifar10 -d "CIFAR-10 CNN, PyTorch, 90%+"` `-p churn-forecast -d "Tabular churn model, gradient boosting, 0.85 AUC target"`
`--low-token` paragraph "Measured on the MNIST bench" "Measured on the canonical reference bench"
E2E Validation section MNIST-specific narrative + `mnist-delivery/` file tree with MNIST-specific filenames Platform-capability framing + generic representative delivery-repo structure showing `.zo/experiments/`, Docker scaffold, three-tier test split. Detailed measured numbers tracked in docs/reference/cost-benchmark.mdx
Status table "E2E validation (MNIST: 99% accuracy)" + "MNIST 99.66% / CIFAR-10 91.62% v1 demos" "E2E validation on full ML-lifecycle reference projects" + "reference-project end-to-end demos". Added a row for the post-1.0.2 `--low-token` + `ZOTrainingCallback` work shipped in PRs #59-#62
v1 demos line Links to `mnist-digit-classifier-delivery/` and `cifar10-classifier-delivery/` Removed; replaced with link to `docs/reference/cost-benchmark.mdx`
Footer signature `v1.0.2` · `validated` · `99% MNIST accuracy` `v1.0.2` · `validated` · `oracle-driven`

Why this and not deleting all evidence

The substantive measurement evidence (cost, accuracy, wall-time) lives in `docs/reference/cost-benchmark.mdx` where it belongs — the README points readers there instead of repeating the toy-dataset numbers up front. The platform's claims are still backed by measured runs; the framing just doesn't lead with the dataset name.

Test plan

  • `grep -i -E "mnist|cifar" README.md` returns zero matches
  • `./scripts/validate-docs.sh` 10/10 passed (1 pre-existing test-count badge measurement-method warning, parametrized tests expand at runtime — unrelated)
  • No source code changes — pure README revision
  • All existing links in the README remain valid

🤖 Generated with Claude Code

SamPlvs and others added 2 commits April 27, 2026 17:52
Toy benchmarks in the README signal "amateur ML demo" rather than
"production-grade research orchestration". The substantive story is
the platform's capabilities — autonomous agent teams, oracle-driven
gating, contract-first spawning, hard ZOTrainingCallback enforcement,
experiment capture + autonomous loop — not which canned dataset we
happened to validate against.

Removed:
- Demo accuracy badge (MNIST 99.66% / CIFAR-10 91.62%) — replaced
  test count badge with current 735 instead.
- "Validated end-to-end with an MNIST digit classification project"
  preamble in E2E Validation section — rewritten as platform-capability
  framing with measured numbers tracked in cost-benchmark.mdx.
- mnist-delivery/ file tree — replaced with a generic representative
  delivery-repo structure showing the .zo/ experiment capture layer,
  Docker scaffold, and three-tier test split.
- Status table rows mentioning specific dataset accuracies —
  rewritten as "E2E validation on full ML-lifecycle reference projects"
  + "reference-project end-to-end demos".
- v1 demos line linking to mnist-digit-classifier-delivery and
  cifar10-classifier-delivery — replaced with link to cost-benchmark.mdx.
- "99% MNIST accuracy" in footer signature — replaced with
  "oracle-driven".
- "zo draft -p cifar10 -d 'CIFAR-10 CNN, PyTorch, 90%+'" example —
  replaced with a tabular-churn-forecast example (more representative
  of the kind of project a research team would actually use ZO for).

Also softened: "Measured on the MNIST bench" → "Measured on the
canonical reference bench" in the --low-token paragraph.

The substantive measurement evidence (cost, accuracy, wall-time) lives
in docs/reference/cost-benchmark.mdx where it belongs — the README
points readers there instead of repeating the toy-dataset numbers.

validate-docs 10/10 (1 pre-existing test-count badge measurement-method
warning, parametrized tests expand at runtime).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Deploying zero-operators with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0ed495b
Status: ✅  Deploy successful!
Preview URL: https://7f613921.zero-operators.pages.dev
Branch Preview URL: https://claude-readme-remove-benchma.zero-operators.pages.dev

View logs

@SamPlvs SamPlvs merged commit 59e1815 into main Apr 27, 2026
1 check passed
@SamPlvs SamPlvs deleted the claude/readme-remove-benchmark-framing branch April 27, 2026 19:23
SamPlvs added a commit that referenced this pull request Apr 30, 2026
…aming

chore(readme): remove MNIST/CIFAR benchmark-dataset framing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant