Markdown documentation with runnable examples by adamziel · Pull Request #254 · WordPress/php-toolkit

adamziel · 2026-05-02T01:20:24Z

Summary

Adds runnable, browseable documentation for every PHP Toolkit component. Each component gets a reference page on the docs site that walks through the API as a sequence of worked examples — short PHP snippets the reader can edit and run in the browser via WordPress Playground, with the captured output already on the page so the result is visible before Playground finishes booting.

Live preview: https://adamziel.github.io/experiments/php-toolkit/

What ships

A reference page per component with: lede, install line, a credit callout for upstream-derived code (HTML, BlockParser, Markdown, Polyfill), a minimal example, then a sequence of refinements that each pair a snippet with a captured result, then a Pitfalls block of bold-lead callouts, then See also links to siblings and to the matching learn-path tutorial. 18 components, 87 runnable snippets.
A short learning path under docs/learn/ (Quickstart → 4 chapters → Recap) that introduces the cursor model, streaming archives, content importing, and network primitives in tutorial form.
A landing page that groups components into four "starter paths" (content & migration, streams & storage, networked tools, WordPress runtime support) and lets readers pick by job-to-be-done rather than alphabetic order.

How a component is authored

Per-component markdown under bin/_docs_components/<slug>.md is the single source of truth for that component's page. One file owns the lede, the prose, every snippet, every captured stdout, the credit callout, and the see-also list. Editing that file changes:

the rendered reference page,
the snippet that runs in CI,
the captured stdout the page paints before Playground boots,
the credit callout, and
the see-also links.

Format:

---
slug: html
title: HTML
install: wp-php-toolkit/html

credit_title: Ported from WordPress core
credit_body: |
  The HTML component is a port of WordPress core's
  <code>WP_HTML_Tag_Processor</code> and <code>WP_HTML_Processor</code>...

see_also: ../learn/01-rewriting-html.html | Tutorial — Rewriting HTML safely | …
see_also: blockparser | BlockParser | …
---

<one-paragraph lede, raw HTML allowed (e.g. <code>...</code>)>

## Section heading

<body HTML — paragraphs separated by blank lines>

<p>Footgun: <strong>Mutations are buffered.</strong> Edits don't appear in the source string until you call <code>get_updated_html()</code>...</p>

<!-- snippet:
filename: lazy-load-images.php
runnable: true
-->
```php
<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$html = <<<'HTML'
<article>
    <img src="hero.jpg" alt="Hero">
    <p>Intro copy.</p>
</article>
HTML;

$tags = new WP_HTML_Tag_Processor( $html );
…
```

<!-- expected-output -->
```
<article>
    <img decoding="async" loading="lazy" src="hero.jpg" alt="Hero">
    <p>Intro copy.</p>
</article>
```

The format and conventions are documented in bin/_docs_components/README.md. Component order is controlled by _order.txt.

Build pipeline

bin/_load_catalog.py parses each .md into a structured component dict: slug, title, install, lede, credit, see-also, and a list of sections (each with heading, body, optional snippet, optional expected output).
bin/build-reference.py renders docs/reference/<slug>.html. Pitfalls (paragraphs starting with Footgun: / Gotcha:) are lifted out of any section into a unified "Pitfalls" callout block; other body markup (tables, <pre> blocks, lists) is preserved verbatim.
bin/run-snippets.py executes runnable snippets under PHP 8.3 against the locally-installed toolkit, normalizes noise (tempfile names, hashes, timestamps), and compares stdout to the captured  block alongside it. --update writes new captures back into the markdown file in place.
bin/_docs_components.py is a thin module that exposes COMPONENTS = load_components() plus the small global metadata that doesn't belong in any single component (STARTER_PATHS, COMPONENT_GUIDES for the landing page).
bin/build-docs-bundle.sh zips the toolkit source for Playground and rebuilds the reference pages. Local preview: python3 bin/serve-docs.py → http://localhost:8787.

CI: every snippet is verified on every PR

.github/workflows/snippet-tests.yml runs bin/run-snippets.py --check on every pull request. The runner:

Iterates every snippet declared in bin/_docs_components/<slug>.md.
Executes the runnable ones; compares stdout (after normalization) to the captured  block alongside it.
Fails on any runtime failure, output drift, or new snippet without a captured baseline.

Snippets whose stdout is unstable (real network traffic, timestamps — currently 8 HttpClient examples) live in run-snippets.py's NO_EXPECTED allowlist; they're still required to exit 0, but their output isn't pinned.

Current state: 87/87 snippets pass.

Rendering details

A static <pre class="snippet-fallback"> sits inside every <php-snippet> so readers see the code even when the cross-origin Playground module fails to register (CSP, adblock, slow network, no-JS clients). CSS hides the fallback once <php-snippet>:defined fires.
Body content is passed through verbatim, so entities like <p> stay as entities and the browser shows the literal example inside <code> instead of treating it as real HTML.
The HTML emitter escapes literal </script> in snippet code as <\/script> so the wrapping <script type="application/x-php"> isn't closed prematurely; page.js reverses that escape on the runtime side after the custom element registers, so PHP runs with the original </script> literal in string contents.
Pitfall callouts open with a bold lead sentence (<strong>Mutations are buffered.</strong> …).
HTML and XML literals in PHP snippets use <<<'HTML' / <<<'XML' heredoc nowdoc delimiters so the markup keeps its visual structure.

Test plan

Verify docs snippets workflow passes (87/87).
PHP 7.2-8.3 / macOS / Linux / Windows component tests still pass (no production code changed).
Spot-check rendered pages on https://adamziel.github.io/experiments/php-toolkit/.
Confirm GitHub renders bin/_docs_components/<slug>.md files cleanly when browsed on github.com (frontmatter, prose, fenced blocks, expected-output blocks).

Before this change, every snippet, lede, and prose paragraph for the docs site lived as Python string literals inside a 2,871-line bin/_docs_components.py. Editing a snippet meant editing Python; the file was painful to skim and github.com had no way to render it usefully. This PR makes per-component markdown the source of truth: bin/_docs_components/ <slug>.md one file per component (lede + sections + snippets) _order.txt the order components appear on the site README.md format documentation for editors Each .md file is the same shape readers see in the docs/_legacy/ pages: YAML-style frontmatter (slug, title, install), a lede paragraph, ## section headings, and at most one fenced ```php snippet per section. Snippet metadata (filename, runnable) sits in an HTML comment immediately above the fence so it travels with the snippet rather than living in a separate catalog. Body content stays as raw HTML — markdown allows that, and it keeps this PR a pure refactor: the rendered output is byte-identical. bin/_load_catalog.py parses these files into the same COMPONENTS data structure that build-docs.py, build-reference.py, and run-snippets.py already consume, so no caller changed. bin/_docs_components.py shrank from 2,871 lines to 354 — it now just calls load_components() and keeps the small structural metadata that does not belong in any single component (STARTER_PATHS, COMPONENT_RELATIONS, CREDITS, COMPONENT_GUIDES). bin/_extract_catalog.py is the one-shot tool that produced the markdown files from the legacy Python catalog. It is kept in the tree as a regression aid: re-extract from any branch, diff the .md files, see what changed. The fence length grows (3 → 4 backticks) when a snippet itself contains a triple-backtick run (the markdown component has one such example with a heredoc'd markdown sample). Verification: - Built docs/learn/, docs/reference/, and docs/index.html before and after this change. Diff is empty: rendering is byte-identical. - bin/run-snippets.py --check: 87/87 runnable snippets pass against the markdown-loaded catalog with the existing _expected_outputs.json. - The Pages and snippet-tests workflows now also trigger on changes to bin/_docs_components/** and bin/_load_catalog.py. Followups (not in this PR): - The legacy docs/_legacy/<slug>/index.html files are stale on trunk (running build-docs.py against trunk produces small diffs against the committed copies). A separate sweep PR can refresh them. - The per-component component README.md files in components/<Name>/ remain hand-written; only the docs-site source moves to markdown. Merging the two sources of truth (so each component has one canonical README.md that feeds both Packagist and the docs site) is an obvious next step but requires a richer markdown→site renderer.

Two leftover sources of HTML truth went away in this change: 1. The hand-authored docs/reference/html.html and docs/reference/zip.html pages were skipped by build-reference.py's SKIP set and edited as raw HTML. They are now generated from bin/_docs_components/{html,zip}.md like every other component. The "When to use which" comparison tables that lived only in the hand-authored pages are ported into the markdown source so nothing is lost. Visual diff (chromium screenshots at 1280×4000): same chrome, same sidebar, same code-block styling, same section flow. Generated pages now have a superset of the hand-authored content, since the markdown catalog already covered more refinements. 2. The docs/_legacy/ tree (18 component pages plus index.html, written by bin/build-docs.py) was a fully orphan directory — nothing in the active site linked to it, the active landing page already linked readers at docs/reference/, and rebuilding from trunk's catalog produced small unexplained diffs against the committed copies. The directory and its generator (bin/build-docs.py) are deleted. The build-docs-bundle.sh helper and .github/workflows/docs.yml workflow now call bin/build-reference.py instead. Net result: bin/_docs_components/<slug>.md is the single source of truth for every component reference page. Editing a snippet means editing the markdown file. CI runs the snippet against PHP 8.3 on every PR (snippet-tests.yml) and the deploy workflow regenerates the HTML on every push to trunk (docs.yml). Verification: - python3 bin/build-reference.py — succeeds for all 18 components (previously failed for html/zip via the SKIP exclusion). - bin/run-snippets.py --check — 87/87 runnable snippets pass. - Headless chromium renders of html.html and zip.html (hand-authored vs generated) confirmed visual parity before deletion.

Three things the rendered reference page used to source from sidecar Python/ JSON now live alongside the snippet they describe in bin/_docs_components/<slug>.md: 1. Expected outputs. Each snippet's captured stdout previously lived in bin/_expected_outputs.json keyed by `<slug>::<filename>`. It is now an `` fenced block sitting directly under the snippet's php fence. The runner reads it, the renderer reads it, and `bin/run-snippets.py --update` writes back to the same fence — so a snippet's code and its captured result travel together. The JSON file is deleted. 2. Credit callouts. The CREDITS dict from bin/_docs_components.py moves into per-component frontmatter as `credit_title` + multi-line `credit_body` (YAML pipe form). Components without a credit (most of them) just omit the keys. The dict is deleted. 3. See-also relations. The COMPONENT_RELATIONS dict moves into per-component frontmatter as repeated `see_also: <slug> | <Title> | <reason>` lines. Each component owns its outgoing edges. The dict is deleted. Implementation: - bin/_load_catalog.py grows a richer loader (`load_components_rich`) that returns dict-shaped components with credit/see_also/snippet-with- expected-output. The legacy tuple-shape `load_components` is preserved for any caller that still wants it. - The frontmatter parser handles three YAML-subset shapes: `key: value`, `key: |` + indented block, and repeated keys → list. - The H2 section splitter now tracks fenced-block state so a `## ` line inside an expected-output fence (e.g. the markdown component's round-trip output) is not interpreted as a section heading. - bin/build-reference.py reads credit/see-also/expected-output via the rich loader; no more imports of CREDITS, COMPONENT_RELATIONS, or the expected-outputs JSON. - bin/run-snippets.py reads expected outputs via the rich loader and writes them back to the markdown file's fence on `--update`. The comparator tolerates a single trailing-newline difference (CommonMark fences don't carry the newline before the closing fence). - bin/_extract_catalog.py is updated to populate all three new fields when re-extracting from a legacy catalog. Verification: - python3 bin/build-reference.py: every page renders byte-identically against the previous baseline (diff -r is empty for docs/reference/). - bin/run-snippets.py --check: 87/87 snippets match expected output. - bin/_docs_components.py shrinks from 354 → 228 lines. Net effect: each <slug>.md is now fully self-describing — open the file on github.com and you see the lede, prose, every snippet, every captured result, the credit, and the related-components list, all in one place.

Drops a pre-built copy of the WordPress/php-toolkit runnable docs site into php-toolkit/ and extends the unified Pages workflow to include it in the deploy artifact. The site is built from the WordPress/php-toolkit#254 branch (docs-markdown-source) — every reference page is generated from the markdown sources in bin/_docs_components/. All in-page links are relative, so the site works fine under the /experiments/php-toolkit/ prefix without any path rewrites. Live preview: https://adamziel.github.io/experiments/php-toolkit/ The existing experiments (zfs-wasm, real-zfs, root index) are untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erals A round of fixes uncovered while previewing the new docs at adamziel.github.io/experiments/php-toolkit/. Each is small and localized; together they close the visible gaps between the markdown- sourced reference pages and the hand-authored html.html / zip.html they replaced. * Static fallback for snippets. <php-snippet> relies on a cross-origin module to register the custom element. Until the module loads (or if it never does — adblocker, slow network, no-JS), the only children are <script> blocks, which are display:none. The renderer now emits an inline <pre class="snippet-fallback"><code> alongside every snippet; CSS hides it once :defined fires. Readers always see the code. * Reverse <\/script> escape on the runtime side. The HTML wrapper escapes literal "</script" to "<\/script" so the outer <script type="application/x-php"> isn't closed prematurely, but the browser preserves the backslash in textContent — so PHP runs with "<\/script" and WP_HTML_Tag_Processor never finds the close tag. patchSnippet in page.js now reverses the escape on el._code once the element registers, so Run produces the same output the author captured. * Stop unescaping body HTML in build-reference.py. Three html.unescape() calls were decoding entities the author intentionally wrote, so literal-tag examples like <code><p>one<p>two</code> rendered as real paragraph tags. Removed all three; markdown sources already store the desired output form. * split_pitfalls() preserves non-paragraph content. The pitfall extractor walked body HTML via re.findall + ''.join, dropping every top-level element that wasn't a <p>...</p>. That destroyed the "When to use which" tables on html and zip pages and the "Run the proxy locally" prose+pre on corsproxy. Now uses re.sub with a replacer so tables, lists, <pre>, and other markup pass through verbatim. * See-also entries can target any URL, not just sibling component slugs. The frontmatter format gains a tiny extension: any first- field value containing "/" or "." is treated as a verbatim href; bare slugs still get the .html suffix. Added Tutorial entries to html, zip, dataliberation, and httpclient — the bridge from reference back to the learn-path chapters that the hand-authored pages had. * Pitfalls ported from the hand-authored pages. The hand-authored html.html and zip.html had richer Pitfalls sections (4 each); the initial extraction only carried 1 and 2 respectively. Three HTML pitfalls (tag closers visited too, uppercase tag names, Tag_Processor vs full Processor) and two Zip pitfalls (in-place updates, encrypted archives) are now Footgun: paragraphs in the markdown source. * sanitize-html.php simplified. The original input produced messy output (stray spaces from remove_attribute) and was busy enough to obscure the lesson. Replaced with the cleaner hand-authored variant. * HTML / XML literals use heredoc nowdoc delimiters. Multi-line markup literals in PHP snippets now use <<<'HTML' or <<<'XML', so the markup keeps its visual structure instead of disappearing inside `'<...>' . '<...>'` chains. 17 string literals across 11 snippets in blockparser, html, xml, and zip migrated. Asset version bumped (20260429-rewrite → 20260503-script-unescape). Verification: - bin/run-snippets.py --check: 87/87 runnable snippets match their captured stdout. Workflow `.github/workflows/snippet-tests.yml` enforces this on every PR. - python3 bin/build-reference.py: every page renders cleanly. - Headless chromium screenshots of html.html and zip.html match the hand-authored variants.

…fall Final visual-fidelity pass against the hand-authored html.html / zip.html on trunk found two remaining gaps: 1. Pitfall callouts opened with a lowercase plain-text sentence ("mutations are buffered. Nothing changes…") instead of a bold short summary ("**Mutations are buffered.** Nothing changes…"). The hand-authored editorial treatment used a leading bold sentence to act as the callout title. Footgun: / Gotcha: paragraphs in the markdown source moved their <strong> from wrapping the prefix to wrapping the lead sentence: Before: <p><strong>Footgun:</strong> mutations are buffered. …</p> After: <p>Footgun: <strong>Mutations are buffered.</strong> …</p> The pitfall extractor strips "Footgun: " (and "Gotcha: ") and emits the rest of the paragraph verbatim, so the bold lead survives into the rendered <aside>. Touched: blockparser, bytestream, html, xml, zip — 11 callouts total. 2. Zip was missing the path-traversal pitfall ("Never extract entry paths verbatim. Always run paths through ZipDecoder::sanitize_path()") that the hand-authored page carried as a security note. Ported. Visual diff: rendered pitfall callouts now match the hand-authored treatment. All 87 snippets still pass.

Markdown is the source of truth for the docs site (bin/_docs_components/ <slug>.md), so the rendered docs/reference/*.html and docs/assets/ php-toolkit.zip don't need to be tracked. They regenerate cleanly from those sources, and CI already does the regeneration on every push to trunk via .github/workflows/docs.yml. Removed (19 files, 7,550 lines): - docs/reference/<slug>.html for all 18 components - docs/assets/php-toolkit.zip (the Playground bundle, ~9 MB) Added paths to .gitignore so a forgotten local build doesn't sneak back in. Updated bin/serve-docs.py to detect missing artifacts and print the exact build commands to run, and bin/_docs_components/README.md to flag that the HTML output is not content. The hand-authored docs/learn/*.html, docs/index.html, and docs/reference/ index.html stay tracked — they're not derived from the markdown catalog.

## Summary The runnable docs site at https://wordpress.github.io/php-toolkit/ shipped in #244 and the markdown sources behind it landed in #254. This PR makes it easy to find from every place a reader is likely to land. ## What changed **`github.com/WordPress/php-toolkit`** — Repo description and homepage URL are set via API to `https://wordpress.github.io/php-toolkit/`, so the sidebar on the repo page surfaces the docs link. **Root `README.md`** — A "📚 Live, runnable docs" callout sits at the top, a per-component table replaces the bullet list (each row links to both the README and the matching reference page on the live site), and a "Building the docs site" subsection in the dev workflow points at `bin/build-docs-bundle.sh` + `bin/serve-docs.py`. **`components/<Name>/README.md`** — All 18 component READMEs gain an idempotent banner immediately under the H1, deep-linking to that component's reference page. The banner is wrapped in HTML-comment markers so future URL/wording updates can be re-applied with one script: ```html  > 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/zip.html](...) > Open the page to edit each snippet in your browser and run it in WordPress Playground.  ``` **`components/<Name>/composer.json`** — Each per-component package (17 of them; `ToolkitCodingStandards` has no composer.json) gains: - `homepage` → the matching reference page - `support: { issues, source, docs }` → links visible on Packagist - shared `keywords: ["wordpress", "php-toolkit"]` The result: someone who lands on https://packagist.org/packages/wp-php-toolkit/zip sees a "Homepage" link to the runnable docs and a "Documentation" link in the support sidebar. **Root `composer.json`** — The `wp-php-toolkit/php-toolkit` meta-package gets the same homepage + support block, plus a description that names the docs URL ("Browse runnable examples at https://wordpress.github.io/php-toolkit/"). **`examples/create-wp-site/README.md`** — Same docs-site banner under the H1 so readers in the examples folder don't miss the docs site. **`AGENTS.md`** — A `Docs site:` pointer added next to the upstream/branch lines, with a one-liner that `bin/_docs_components/<slug>.md` is where the runnable examples live. ## Test plan - [ ] CI passes (no code changes; 87/87 snippets still match). - [ ] After merge, https://github.com/WordPress/php-toolkit shows the homepage URL on the right sidebar. - [ ] After Packagist re-sync (auto on next release, or manual via the package page), each `wp-php-toolkit/*` page surfaces the docs URL under "Homepage" and "Documentation". - [ ] Component READMEs render the banner cleanly on github.com (HTML-comment markers stay invisible). 🤖 Generated with [Claude Code](https://claude.com/claude-code)

adamziel added 3 commits May 2, 2026 03:19

adamziel added 2 commits May 3, 2026 17:53

adamziel changed the title ~~docs: source the catalog from per-component markdown files~~ docs: per-component markdown is the source of truth for the runnable docs site May 3, 2026

adamziel changed the title ~~docs: per-component markdown is the source of truth for the runnable docs site~~ Markdown documentation with runnable examples May 3, 2026

adamziel merged commit 394356b into trunk May 3, 2026
28 of 29 checks passed

adamziel deleted the docs-markdown-source branch May 3, 2026 16:50

adamziel mentioned this pull request May 3, 2026

docs: surface the runnable docs site from every entry point #256

Merged

4 tasks

This was referenced May 3, 2026

docs: link READMEs to the runnable docs site #252

Closed

ci: enforce every docs snippet — no silent skips #253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown documentation with runnable examples#254

Markdown documentation with runnable examples#254
adamziel merged 6 commits into
trunkfrom
docs-markdown-source

adamziel commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adamziel commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What ships

How a component is authored

Build pipeline

CI: every snippet is verified on every PR

Rendering details

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adamziel commented May 2, 2026 •

edited

Loading