chore(readme): remove MNIST/CIFAR benchmark-dataset framing by SamPlvs · Pull Request #63 · SamPlvs/zero-operators

SamPlvs · 2026-04-27T16:53:43Z

Summary

Toy benchmarks in the README signal "amateur ML demo" rather than "production-grade research orchestration". The substantive story is the platform's capabilities — autonomous agent teams, oracle-driven gating, contract-first spawning, hard `ZOTrainingCallback` enforcement, experiment capture + autonomous loop — not which canned dataset we happened to validate against.

Changes

Where	Before	After
Top badges	`MNIST 99.66% / CIFAR10 91.62%` demo badge + `tests-704_passing`	Demo badge removed; tests badge updated to `735_passing` (current)
`zo draft` example	`-p cifar10 -d "CIFAR-10 CNN, PyTorch, 90%+"`	`-p churn-forecast -d "Tabular churn model, gradient boosting, 0.85 AUC target"`
`--low-token` paragraph	"Measured on the MNIST bench"	"Measured on the canonical reference bench"
E2E Validation section	MNIST-specific narrative + `mnist-delivery/` file tree with MNIST-specific filenames	Platform-capability framing + generic representative delivery-repo structure showing `.zo/experiments/`, Docker scaffold, three-tier test split. Detailed measured numbers tracked in docs/reference/cost-benchmark.mdx
Status table	"E2E validation (MNIST: 99% accuracy)" + "MNIST 99.66% / CIFAR-10 91.62% v1 demos"	"E2E validation on full ML-lifecycle reference projects" + "reference-project end-to-end demos". Added a row for the post-1.0.2 `--low-token` + `ZOTrainingCallback` work shipped in PRs #59-#62
v1 demos line	Links to `mnist-digit-classifier-delivery/` and `cifar10-classifier-delivery/`	Removed; replaced with link to `docs/reference/cost-benchmark.mdx`
Footer signature	`v1.0.2` · `validated` · `99% MNIST accuracy`	`v1.0.2` · `validated` · `oracle-driven`

Why this and not deleting all evidence

The substantive measurement evidence (cost, accuracy, wall-time) lives in `docs/reference/cost-benchmark.mdx` where it belongs — the README points readers there instead of repeating the toy-dataset numbers up front. The platform's claims are still backed by measured runs; the framing just doesn't lead with the dataset name.

Test plan

`grep -i -E "mnist|cifar" README.md` returns zero matches
`./scripts/validate-docs.sh` 10/10 passed (1 pre-existing test-count badge measurement-method warning, parametrized tests expand at runtime — unrelated)
No source code changes — pure README revision
All existing links in the README remain valid

🤖 Generated with Claude Code

Toy benchmarks in the README signal "amateur ML demo" rather than "production-grade research orchestration". The substantive story is the platform's capabilities — autonomous agent teams, oracle-driven gating, contract-first spawning, hard ZOTrainingCallback enforcement, experiment capture + autonomous loop — not which canned dataset we happened to validate against. Removed: - Demo accuracy badge (MNIST 99.66% / CIFAR-10 91.62%) — replaced test count badge with current 735 instead. - "Validated end-to-end with an MNIST digit classification project" preamble in E2E Validation section — rewritten as platform-capability framing with measured numbers tracked in cost-benchmark.mdx. - mnist-delivery/ file tree — replaced with a generic representative delivery-repo structure showing the .zo/ experiment capture layer, Docker scaffold, and three-tier test split. - Status table rows mentioning specific dataset accuracies — rewritten as "E2E validation on full ML-lifecycle reference projects" + "reference-project end-to-end demos". - v1 demos line linking to mnist-digit-classifier-delivery and cifar10-classifier-delivery — replaced with link to cost-benchmark.mdx. - "99% MNIST accuracy" in footer signature — replaced with "oracle-driven". - "zo draft -p cifar10 -d 'CIFAR-10 CNN, PyTorch, 90%+'" example — replaced with a tabular-churn-forecast example (more representative of the kind of project a research team would actually use ZO for). Also softened: "Measured on the MNIST bench" → "Measured on the canonical reference bench" in the --low-token paragraph. The substantive measurement evidence (cost, accuracy, wall-time) lives in docs/reference/cost-benchmark.mdx where it belongs — the README points readers there instead of repeating the toy-dataset numbers. validate-docs 10/10 (1 pre-existing test-count badge measurement-method warning, parametrized tests expand at runtime). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-04-27T18:39:05Z

Deploying zero-operators with Cloudflare Pages

Latest commit:	`0ed495b`
Status:	✅ Deploy successful!
Preview URL:	https://7f613921.zero-operators.pages.dev
Branch Preview URL:	https://claude-readme-remove-benchma.zero-operators.pages.dev

View logs

…aming chore(readme): remove MNIST/CIFAR benchmark-dataset framing

SamPlvs and others added 2 commits April 27, 2026 17:52

Merge branch 'main' into claude/readme-remove-benchmark-framing

0ed495b

SamPlvs merged commit 59e1815 into main Apr 27, 2026
1 check passed

SamPlvs deleted the claude/readme-remove-benchmark-framing branch April 27, 2026 19:23

SamPlvs added a commit that referenced this pull request Apr 30, 2026

Merge pull request #63 from SamPlvs/claude/readme-remove-benchmark-fr…

01149d9

…aming chore(readme): remove MNIST/CIFAR benchmark-dataset framing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(readme): remove MNIST/CIFAR benchmark-dataset framing#63

chore(readme): remove MNIST/CIFAR benchmark-dataset framing#63
SamPlvs merged 2 commits into
mainfrom
claude/readme-remove-benchmark-framing

SamPlvs commented Apr 27, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SamPlvs commented Apr 27, 2026

Summary

Changes

Why this and not deleting all evidence

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying zero-operators with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Apr 27, 2026 •

edited

Loading