Skip to content

Markdown documentation with runnable examples#254

Merged
adamziel merged 6 commits into
trunkfrom
docs-markdown-source
May 3, 2026
Merged

Markdown documentation with runnable examples#254
adamziel merged 6 commits into
trunkfrom
docs-markdown-source

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented May 2, 2026

Summary

Adds runnable, browseable documentation for every PHP Toolkit component. Each component gets a reference page on the docs site that walks through the API as a sequence of worked examples — short PHP snippets the reader can edit and run in the browser via WordPress Playground, with the captured output already on the page so the result is visible before Playground finishes booting.

Live preview: https://adamziel.github.io/experiments/php-toolkit/

What ships

  • A reference page per component with: lede, install line, a credit callout for upstream-derived code (HTML, BlockParser, Markdown, Polyfill), a minimal example, then a sequence of refinements that each pair a snippet with a captured result, then a Pitfalls block of bold-lead callouts, then See also links to siblings and to the matching learn-path tutorial. 18 components, 87 runnable snippets.
  • A short learning path under docs/learn/ (Quickstart → 4 chapters → Recap) that introduces the cursor model, streaming archives, content importing, and network primitives in tutorial form.
  • A landing page that groups components into four "starter paths" (content & migration, streams & storage, networked tools, WordPress runtime support) and lets readers pick by job-to-be-done rather than alphabetic order.

How a component is authored

Per-component markdown under bin/_docs_components/<slug>.md is the single source of truth for that component's page. One file owns the lede, the prose, every snippet, every captured stdout, the credit callout, and the see-also list. Editing that file changes:

  • the rendered reference page,
  • the snippet that runs in CI,
  • the captured stdout the page paints before Playground boots,
  • the credit callout, and
  • the see-also links.

Format:

---
slug: html
title: HTML
install: wp-php-toolkit/html

credit_title: Ported from WordPress core
credit_body: |
  The HTML component is a port of WordPress core's
  <code>WP_HTML_Tag_Processor</code> and <code>WP_HTML_Processor</code>...

see_also: ../learn/01-rewriting-html.html | Tutorial — Rewriting HTML safely | …
see_also: blockparser | BlockParser | …
---

<one-paragraph lede, raw HTML allowed (e.g. <code>...</code>)>

## Section heading

<body HTML  paragraphs separated by blank lines>

<p>Footgun: <strong>Mutations are buffered.</strong> Edits don't appear in the source string until you call <code>get_updated_html()</code>...</p>

<!-- snippet:
filename: lazy-load-images.php
runnable: true
-->
```php
<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$html = <<<'HTML'
<article>
    <img src="hero.jpg" alt="Hero">
    <p>Intro copy.</p>
</article>
HTML;

$tags = new WP_HTML_Tag_Processor( $html );
…
```

<!-- expected-output -->
```
<article>
    <img decoding="async" loading="lazy" src="hero.jpg" alt="Hero">
    <p>Intro copy.</p>
</article>
```

The format and conventions are documented in bin/_docs_components/README.md. Component order is controlled by _order.txt.

Build pipeline

  • bin/_load_catalog.py parses each .md into a structured component dict: slug, title, install, lede, credit, see-also, and a list of sections (each with heading, body, optional snippet, optional expected output).
  • bin/build-reference.py renders docs/reference/<slug>.html. Pitfalls (paragraphs starting with Footgun: / Gotcha:) are lifted out of any section into a unified "Pitfalls" callout block; other body markup (tables, <pre> blocks, lists) is preserved verbatim.
  • bin/run-snippets.py executes runnable snippets under PHP 8.3 against the locally-installed toolkit, normalizes noise (tempfile names, hashes, timestamps), and compares stdout to the captured <!-- expected-output --> block alongside it. --update writes new captures back into the markdown file in place.
  • bin/_docs_components.py is a thin module that exposes COMPONENTS = load_components() plus the small global metadata that doesn't belong in any single component (STARTER_PATHS, COMPONENT_GUIDES for the landing page).
  • bin/build-docs-bundle.sh zips the toolkit source for Playground and rebuilds the reference pages. Local preview: python3 bin/serve-docs.pyhttp://localhost:8787.

CI: every snippet is verified on every PR

.github/workflows/snippet-tests.yml runs bin/run-snippets.py --check on every pull request. The runner:

  • Iterates every snippet declared in bin/_docs_components/<slug>.md.
  • Executes the runnable ones; compares stdout (after normalization) to the captured <!-- expected-output --> block alongside it.
  • Fails on any runtime failure, output drift, or new snippet without a captured baseline.

Snippets whose stdout is unstable (real network traffic, timestamps — currently 8 HttpClient examples) live in run-snippets.py's NO_EXPECTED allowlist; they're still required to exit 0, but their output isn't pinned.

Current state: 87/87 snippets pass.

Rendering details

  • A static <pre class="snippet-fallback"> sits inside every <php-snippet> so readers see the code even when the cross-origin Playground module fails to register (CSP, adblock, slow network, no-JS clients). CSS hides the fallback once <php-snippet>:defined fires.
  • Body content is passed through verbatim, so entities like &lt;p&gt; stay as entities and the browser shows the literal example inside <code> instead of treating it as real HTML.
  • The HTML emitter escapes literal </script> in snippet code as <\/script> so the wrapping <script type="application/x-php"> isn't closed prematurely; page.js reverses that escape on the runtime side after the custom element registers, so PHP runs with the original </script> literal in string contents.
  • Pitfall callouts open with a bold lead sentence (<strong>Mutations are buffered.</strong> …).
  • HTML and XML literals in PHP snippets use <<<'HTML' / <<<'XML' heredoc nowdoc delimiters so the markup keeps its visual structure.

Test plan

  • Verify docs snippets workflow passes (87/87).
  • PHP 7.2-8.3 / macOS / Linux / Windows component tests still pass (no production code changed).
  • Spot-check rendered pages on https://adamziel.github.io/experiments/php-toolkit/.
  • Confirm GitHub renders bin/_docs_components/<slug>.md files cleanly when browsed on github.com (frontmatter, prose, fenced blocks, expected-output blocks).

adamziel added 3 commits May 2, 2026 03:19
Before this change, every snippet, lede, and prose paragraph for the docs
site lived as Python string literals inside a 2,871-line bin/_docs_components.py.
Editing a snippet meant editing Python; the file was painful to skim and
github.com had no way to render it usefully.

This PR makes per-component markdown the source of truth:

  bin/_docs_components/
    <slug>.md      one file per component (lede + sections + snippets)
    _order.txt     the order components appear on the site
    README.md      format documentation for editors

Each .md file is the same shape readers see in the docs/_legacy/ pages:
YAML-style frontmatter (slug, title, install), a lede paragraph, ## section
headings, and at most one fenced ```php snippet per section. Snippet
metadata (filename, runnable) sits in an HTML comment immediately above the
fence so it travels with the snippet rather than living in a separate
catalog. Body content stays as raw HTML — markdown allows that, and it
keeps this PR a pure refactor: the rendered output is byte-identical.

bin/_load_catalog.py parses these files into the same COMPONENTS data
structure that build-docs.py, build-reference.py, and run-snippets.py
already consume, so no caller changed. bin/_docs_components.py shrank from
2,871 lines to 354 — it now just calls load_components() and keeps the
small structural metadata that does not belong in any single component
(STARTER_PATHS, COMPONENT_RELATIONS, CREDITS, COMPONENT_GUIDES).

bin/_extract_catalog.py is the one-shot tool that produced the markdown
files from the legacy Python catalog. It is kept in the tree as a
regression aid: re-extract from any branch, diff the .md files, see what
changed. The fence length grows (3 → 4 backticks) when a snippet itself
contains a triple-backtick run (the markdown component has one such
example with a heredoc'd markdown sample).

Verification:
  - Built docs/learn/, docs/reference/, and docs/index.html before and
    after this change. Diff is empty: rendering is byte-identical.
  - bin/run-snippets.py --check: 87/87 runnable snippets pass against the
    markdown-loaded catalog with the existing _expected_outputs.json.
  - The Pages and snippet-tests workflows now also trigger on changes to
    bin/_docs_components/** and bin/_load_catalog.py.

Followups (not in this PR):
  - The legacy docs/_legacy/<slug>/index.html files are stale on trunk
    (running build-docs.py against trunk produces small diffs against the
    committed copies). A separate sweep PR can refresh them.
  - The per-component component README.md files in components/<Name>/
    remain hand-written; only the docs-site source moves to markdown.
    Merging the two sources of truth (so each component has one canonical
    README.md that feeds both Packagist and the docs site) is an obvious
    next step but requires a richer markdown→site renderer.
Two leftover sources of HTML truth went away in this change:

  1. The hand-authored docs/reference/html.html and docs/reference/zip.html
     pages were skipped by build-reference.py's SKIP set and edited as raw
     HTML. They are now generated from bin/_docs_components/{html,zip}.md
     like every other component. The "When to use which" comparison tables
     that lived only in the hand-authored pages are ported into the
     markdown source so nothing is lost. Visual diff (chromium screenshots
     at 1280×4000): same chrome, same sidebar, same code-block styling,
     same section flow. Generated pages now have a superset of the
     hand-authored content, since the markdown catalog already covered
     more refinements.

  2. The docs/_legacy/ tree (18 component pages plus index.html, written
     by bin/build-docs.py) was a fully orphan directory — nothing in the
     active site linked to it, the active landing page already linked
     readers at docs/reference/, and rebuilding from trunk's catalog
     produced small unexplained diffs against the committed copies. The
     directory and its generator (bin/build-docs.py) are deleted. The
     build-docs-bundle.sh helper and .github/workflows/docs.yml workflow
     now call bin/build-reference.py instead.

Net result: bin/_docs_components/<slug>.md is the single source of truth
for every component reference page. Editing a snippet means editing the
markdown file. CI runs the snippet against PHP 8.3 on every PR
(snippet-tests.yml) and the deploy workflow regenerates the HTML on every
push to trunk (docs.yml).

Verification:
  - python3 bin/build-reference.py — succeeds for all 18 components
    (previously failed for html/zip via the SKIP exclusion).
  - bin/run-snippets.py --check — 87/87 runnable snippets pass.
  - Headless chromium renders of html.html and zip.html (hand-authored vs
    generated) confirmed visual parity before deletion.
Three things the rendered reference page used to source from sidecar Python/
JSON now live alongside the snippet they describe in
bin/_docs_components/<slug>.md:

  1. Expected outputs. Each snippet's captured stdout previously lived in
     bin/_expected_outputs.json keyed by `<slug>::<filename>`. It is now an
     `<!-- expected-output -->` fenced block sitting directly under the
     snippet's php fence. The runner reads it, the renderer reads it, and
     `bin/run-snippets.py --update` writes back to the same fence — so a
     snippet's code and its captured result travel together. The JSON file
     is deleted.

  2. Credit callouts. The CREDITS dict from bin/_docs_components.py moves
     into per-component frontmatter as `credit_title` + multi-line
     `credit_body` (YAML pipe form). Components without a credit (most of
     them) just omit the keys. The dict is deleted.

  3. See-also relations. The COMPONENT_RELATIONS dict moves into
     per-component frontmatter as repeated `see_also: <slug> | <Title> |
     <reason>` lines. Each component owns its outgoing edges. The dict is
     deleted.

Implementation:

  - bin/_load_catalog.py grows a richer loader (`load_components_rich`)
    that returns dict-shaped components with credit/see_also/snippet-with-
    expected-output. The legacy tuple-shape `load_components` is preserved
    for any caller that still wants it.
  - The frontmatter parser handles three YAML-subset shapes: `key: value`,
    `key: |` + indented block, and repeated keys → list.
  - The H2 section splitter now tracks fenced-block state so a `## ` line
    inside an expected-output fence (e.g. the markdown component's
    round-trip output) is not interpreted as a section heading.
  - bin/build-reference.py reads credit/see-also/expected-output via the
    rich loader; no more imports of CREDITS, COMPONENT_RELATIONS, or the
    expected-outputs JSON.
  - bin/run-snippets.py reads expected outputs via the rich loader and
    writes them back to the markdown file's fence on `--update`. The
    comparator tolerates a single trailing-newline difference (CommonMark
    fences don't carry the newline before the closing fence).
  - bin/_extract_catalog.py is updated to populate all three new fields
    when re-extracting from a legacy catalog.

Verification:
  - python3 bin/build-reference.py: every page renders byte-identically
    against the previous baseline (diff -r is empty for docs/reference/).
  - bin/run-snippets.py --check: 87/87 snippets match expected output.
  - bin/_docs_components.py shrinks from 354 → 228 lines.

Net effect: each <slug>.md is now fully self-describing — open the file
on github.com and you see the lede, prose, every snippet, every captured
result, the credit, and the related-components list, all in one place.
adamziel added a commit to adamziel/experiments that referenced this pull request May 2, 2026
Drops a pre-built copy of the WordPress/php-toolkit runnable docs site
into php-toolkit/ and extends the unified Pages workflow to include it
in the deploy artifact.

The site is built from the WordPress/php-toolkit#254 branch
(docs-markdown-source) — every reference page is generated from the
markdown sources in bin/_docs_components/. All in-page links are
relative, so the site works fine under the /experiments/php-toolkit/
prefix without any path rewrites.

Live preview: https://adamziel.github.io/experiments/php-toolkit/

The existing experiments (zfs-wasm, real-zfs, root index) are untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
adamziel added 2 commits May 3, 2026 17:53
…erals

A round of fixes uncovered while previewing the new docs at
adamziel.github.io/experiments/php-toolkit/. Each is small and
localized; together they close the visible gaps between the markdown-
sourced reference pages and the hand-authored html.html / zip.html
they replaced.

* Static fallback for snippets. <php-snippet> relies on a cross-origin
  module to register the custom element. Until the module loads (or
  if it never does — adblocker, slow network, no-JS), the only
  children are <script> blocks, which are display:none. The renderer
  now emits an inline <pre class="snippet-fallback"><code> alongside
  every snippet; CSS hides it once :defined fires. Readers always see
  the code.

* Reverse <\/script> escape on the runtime side. The HTML wrapper
  escapes literal "</script" to "<\/script" so the outer
  <script type="application/x-php"> isn't closed prematurely, but the
  browser preserves the backslash in textContent — so PHP runs with
  "<\/script" and WP_HTML_Tag_Processor never finds the close tag.
  patchSnippet in page.js now reverses the escape on el._code once
  the element registers, so Run produces the same output the author
  captured.

* Stop unescaping body HTML in build-reference.py. Three html.unescape()
  calls were decoding entities the author intentionally wrote, so
  literal-tag examples like <code>&lt;p&gt;one&lt;p&gt;two</code>
  rendered as real paragraph tags. Removed all three; markdown
  sources already store the desired output form.

* split_pitfalls() preserves non-paragraph content. The pitfall
  extractor walked body HTML via re.findall + ''.join, dropping
  every top-level element that wasn't a <p>...</p>. That destroyed
  the "When to use which" tables on html and zip pages and the
  "Run the proxy locally" prose+pre on corsproxy. Now uses re.sub
  with a replacer so tables, lists, <pre>, and other markup pass
  through verbatim.

* See-also entries can target any URL, not just sibling component
  slugs. The frontmatter format gains a tiny extension: any first-
  field value containing "/" or "." is treated as a verbatim href;
  bare slugs still get the .html suffix. Added Tutorial entries to
  html, zip, dataliberation, and httpclient — the bridge from
  reference back to the learn-path chapters that the hand-authored
  pages had.

* Pitfalls ported from the hand-authored pages. The hand-authored
  html.html and zip.html had richer Pitfalls sections (4 each); the
  initial extraction only carried 1 and 2 respectively. Three HTML
  pitfalls (tag closers visited too, uppercase tag names, Tag_Processor
  vs full Processor) and two Zip pitfalls (in-place updates, encrypted
  archives) are now Footgun: paragraphs in the markdown source.

* sanitize-html.php simplified. The original input produced messy
  output (stray spaces from remove_attribute) and was busy enough
  to obscure the lesson. Replaced with the cleaner hand-authored
  variant.

* HTML / XML literals use heredoc nowdoc delimiters. Multi-line
  markup literals in PHP snippets now use <<<'HTML' or <<<'XML',
  so the markup keeps its visual structure instead of disappearing
  inside `'<...>' . '<...>'` chains. 17 string literals across 11
  snippets in blockparser, html, xml, and zip migrated.

Asset version bumped (20260429-rewrite → 20260503-script-unescape).

Verification:
  - bin/run-snippets.py --check: 87/87 runnable snippets match their
    captured stdout. Workflow `.github/workflows/snippet-tests.yml`
    enforces this on every PR.
  - python3 bin/build-reference.py: every page renders cleanly.
  - Headless chromium screenshots of html.html and zip.html match
    the hand-authored variants.
…fall

Final visual-fidelity pass against the hand-authored html.html / zip.html
on trunk found two remaining gaps:

  1. Pitfall callouts opened with a lowercase plain-text sentence
     ("mutations are buffered. Nothing changes…") instead of a bold
     short summary ("**Mutations are buffered.** Nothing changes…").
     The hand-authored editorial treatment used a leading bold sentence
     to act as the callout title.

     Footgun: / Gotcha: paragraphs in the markdown source moved their
     <strong> from wrapping the prefix to wrapping the lead sentence:

       Before:  <p><strong>Footgun:</strong> mutations are buffered. …</p>
       After:   <p>Footgun: <strong>Mutations are buffered.</strong> …</p>

     The pitfall extractor strips "Footgun: " (and "Gotcha: ") and emits
     the rest of the paragraph verbatim, so the bold lead survives into
     the rendered <aside>. Touched: blockparser, bytestream, html, xml,
     zip — 11 callouts total.

  2. Zip was missing the path-traversal pitfall ("Never extract entry
     paths verbatim. Always run paths through ZipDecoder::sanitize_path()")
     that the hand-authored page carried as a security note. Ported.

Visual diff: rendered pitfall callouts now match the hand-authored
treatment. All 87 snippets still pass.
@adamziel adamziel changed the title docs: source the catalog from per-component markdown files docs: per-component markdown is the source of truth for the runnable docs site May 3, 2026
@adamziel adamziel changed the title docs: per-component markdown is the source of truth for the runnable docs site Markdown documentation with runnable examples May 3, 2026
Markdown is the source of truth for the docs site (bin/_docs_components/
<slug>.md), so the rendered docs/reference/*.html and docs/assets/
php-toolkit.zip don't need to be tracked. They regenerate cleanly
from those sources, and CI already does the regeneration on every push
to trunk via .github/workflows/docs.yml.

Removed (19 files, 7,550 lines):
  - docs/reference/<slug>.html for all 18 components
  - docs/assets/php-toolkit.zip (the Playground bundle, ~9 MB)

Added paths to .gitignore so a forgotten local build doesn't sneak back
in. Updated bin/serve-docs.py to detect missing artifacts and print the
exact build commands to run, and bin/_docs_components/README.md to flag
that the HTML output is not content.

The hand-authored docs/learn/*.html, docs/index.html, and docs/reference/
index.html stay tracked — they're not derived from the markdown catalog.
@adamziel adamziel merged commit 394356b into trunk May 3, 2026
28 of 29 checks passed
@adamziel adamziel deleted the docs-markdown-source branch May 3, 2026 16:50
adamziel added a commit that referenced this pull request May 3, 2026
## Summary

The runnable docs site at https://wordpress.github.io/php-toolkit/
shipped in #244 and the markdown sources behind it landed in #254. This
PR makes it easy to find from every place a reader is likely to land.

## What changed

**`github.com/WordPress/php-toolkit`** — Repo description and homepage
URL are set via API to `https://wordpress.github.io/php-toolkit/`, so
the sidebar on the repo page surfaces the docs link.

**Root `README.md`** — A "📚 Live, runnable docs" callout sits at the
top, a per-component table replaces the bullet list (each row links to
both the README and the matching reference page on the live site), and a
"Building the docs site" subsection in the dev workflow points at
`bin/build-docs-bundle.sh` + `bin/serve-docs.py`.

**`components/<Name>/README.md`** — All 18 component READMEs gain an
idempotent banner immediately under the H1, deep-linking to that
component's reference page. The banner is wrapped in HTML-comment
markers so future URL/wording updates can be re-applied with one script:

```html
<!-- docs-site-banner -->
> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/zip.html](...)
> Open the page to edit each snippet in your browser and run it in WordPress Playground.
<!-- /docs-site-banner -->
```

**`components/<Name>/composer.json`** — Each per-component package (17
of them; `ToolkitCodingStandards` has no composer.json) gains:

- `homepage` → the matching reference page
- `support: { issues, source, docs }` → links visible on Packagist
- shared `keywords: ["wordpress", "php-toolkit"]`

The result: someone who lands on
https://packagist.org/packages/wp-php-toolkit/zip sees a "Homepage" link
to the runnable docs and a "Documentation" link in the support sidebar.

**Root `composer.json`** — The `wp-php-toolkit/php-toolkit` meta-package
gets the same homepage + support block, plus a description that names
the docs URL ("Browse runnable examples at
https://wordpress.github.io/php-toolkit/").

**`examples/create-wp-site/README.md`** — Same docs-site banner under
the H1 so readers in the examples folder don't miss the docs site.

**`AGENTS.md`** — A `Docs site:` pointer added next to the
upstream/branch lines, with a one-liner that
`bin/_docs_components/<slug>.md` is where the runnable examples live.

## Test plan

- [ ] CI passes (no code changes; 87/87 snippets still match).
- [ ] After merge, https://github.com/WordPress/php-toolkit shows the
homepage URL on the right sidebar.
- [ ] After Packagist re-sync (auto on next release, or manual via the
package page), each `wp-php-toolkit/*` page surfaces the docs URL under
"Homepage" and "Documentation".
- [ ] Component READMEs render the banner cleanly on github.com
(HTML-comment markers stay invisible).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant