Markdown documentation with runnable examples#254
Merged
Conversation
Before this change, every snippet, lede, and prose paragraph for the docs
site lived as Python string literals inside a 2,871-line bin/_docs_components.py.
Editing a snippet meant editing Python; the file was painful to skim and
github.com had no way to render it usefully.
This PR makes per-component markdown the source of truth:
bin/_docs_components/
<slug>.md one file per component (lede + sections + snippets)
_order.txt the order components appear on the site
README.md format documentation for editors
Each .md file is the same shape readers see in the docs/_legacy/ pages:
YAML-style frontmatter (slug, title, install), a lede paragraph, ## section
headings, and at most one fenced ```php snippet per section. Snippet
metadata (filename, runnable) sits in an HTML comment immediately above the
fence so it travels with the snippet rather than living in a separate
catalog. Body content stays as raw HTML — markdown allows that, and it
keeps this PR a pure refactor: the rendered output is byte-identical.
bin/_load_catalog.py parses these files into the same COMPONENTS data
structure that build-docs.py, build-reference.py, and run-snippets.py
already consume, so no caller changed. bin/_docs_components.py shrank from
2,871 lines to 354 — it now just calls load_components() and keeps the
small structural metadata that does not belong in any single component
(STARTER_PATHS, COMPONENT_RELATIONS, CREDITS, COMPONENT_GUIDES).
bin/_extract_catalog.py is the one-shot tool that produced the markdown
files from the legacy Python catalog. It is kept in the tree as a
regression aid: re-extract from any branch, diff the .md files, see what
changed. The fence length grows (3 → 4 backticks) when a snippet itself
contains a triple-backtick run (the markdown component has one such
example with a heredoc'd markdown sample).
Verification:
- Built docs/learn/, docs/reference/, and docs/index.html before and
after this change. Diff is empty: rendering is byte-identical.
- bin/run-snippets.py --check: 87/87 runnable snippets pass against the
markdown-loaded catalog with the existing _expected_outputs.json.
- The Pages and snippet-tests workflows now also trigger on changes to
bin/_docs_components/** and bin/_load_catalog.py.
Followups (not in this PR):
- The legacy docs/_legacy/<slug>/index.html files are stale on trunk
(running build-docs.py against trunk produces small diffs against the
committed copies). A separate sweep PR can refresh them.
- The per-component component README.md files in components/<Name>/
remain hand-written; only the docs-site source moves to markdown.
Merging the two sources of truth (so each component has one canonical
README.md that feeds both Packagist and the docs site) is an obvious
next step but requires a richer markdown→site renderer.
Two leftover sources of HTML truth went away in this change:
1. The hand-authored docs/reference/html.html and docs/reference/zip.html
pages were skipped by build-reference.py's SKIP set and edited as raw
HTML. They are now generated from bin/_docs_components/{html,zip}.md
like every other component. The "When to use which" comparison tables
that lived only in the hand-authored pages are ported into the
markdown source so nothing is lost. Visual diff (chromium screenshots
at 1280×4000): same chrome, same sidebar, same code-block styling,
same section flow. Generated pages now have a superset of the
hand-authored content, since the markdown catalog already covered
more refinements.
2. The docs/_legacy/ tree (18 component pages plus index.html, written
by bin/build-docs.py) was a fully orphan directory — nothing in the
active site linked to it, the active landing page already linked
readers at docs/reference/, and rebuilding from trunk's catalog
produced small unexplained diffs against the committed copies. The
directory and its generator (bin/build-docs.py) are deleted. The
build-docs-bundle.sh helper and .github/workflows/docs.yml workflow
now call bin/build-reference.py instead.
Net result: bin/_docs_components/<slug>.md is the single source of truth
for every component reference page. Editing a snippet means editing the
markdown file. CI runs the snippet against PHP 8.3 on every PR
(snippet-tests.yml) and the deploy workflow regenerates the HTML on every
push to trunk (docs.yml).
Verification:
- python3 bin/build-reference.py — succeeds for all 18 components
(previously failed for html/zip via the SKIP exclusion).
- bin/run-snippets.py --check — 87/87 runnable snippets pass.
- Headless chromium renders of html.html and zip.html (hand-authored vs
generated) confirmed visual parity before deletion.
Three things the rendered reference page used to source from sidecar Python/
JSON now live alongside the snippet they describe in
bin/_docs_components/<slug>.md:
1. Expected outputs. Each snippet's captured stdout previously lived in
bin/_expected_outputs.json keyed by `<slug>::<filename>`. It is now an
`<!-- expected-output -->` fenced block sitting directly under the
snippet's php fence. The runner reads it, the renderer reads it, and
`bin/run-snippets.py --update` writes back to the same fence — so a
snippet's code and its captured result travel together. The JSON file
is deleted.
2. Credit callouts. The CREDITS dict from bin/_docs_components.py moves
into per-component frontmatter as `credit_title` + multi-line
`credit_body` (YAML pipe form). Components without a credit (most of
them) just omit the keys. The dict is deleted.
3. See-also relations. The COMPONENT_RELATIONS dict moves into
per-component frontmatter as repeated `see_also: <slug> | <Title> |
<reason>` lines. Each component owns its outgoing edges. The dict is
deleted.
Implementation:
- bin/_load_catalog.py grows a richer loader (`load_components_rich`)
that returns dict-shaped components with credit/see_also/snippet-with-
expected-output. The legacy tuple-shape `load_components` is preserved
for any caller that still wants it.
- The frontmatter parser handles three YAML-subset shapes: `key: value`,
`key: |` + indented block, and repeated keys → list.
- The H2 section splitter now tracks fenced-block state so a `## ` line
inside an expected-output fence (e.g. the markdown component's
round-trip output) is not interpreted as a section heading.
- bin/build-reference.py reads credit/see-also/expected-output via the
rich loader; no more imports of CREDITS, COMPONENT_RELATIONS, or the
expected-outputs JSON.
- bin/run-snippets.py reads expected outputs via the rich loader and
writes them back to the markdown file's fence on `--update`. The
comparator tolerates a single trailing-newline difference (CommonMark
fences don't carry the newline before the closing fence).
- bin/_extract_catalog.py is updated to populate all three new fields
when re-extracting from a legacy catalog.
Verification:
- python3 bin/build-reference.py: every page renders byte-identically
against the previous baseline (diff -r is empty for docs/reference/).
- bin/run-snippets.py --check: 87/87 snippets match expected output.
- bin/_docs_components.py shrinks from 354 → 228 lines.
Net effect: each <slug>.md is now fully self-describing — open the file
on github.com and you see the lede, prose, every snippet, every captured
result, the credit, and the related-components list, all in one place.
adamziel
added a commit
to adamziel/experiments
that referenced
this pull request
May 2, 2026
Drops a pre-built copy of the WordPress/php-toolkit runnable docs site into php-toolkit/ and extends the unified Pages workflow to include it in the deploy artifact. The site is built from the WordPress/php-toolkit#254 branch (docs-markdown-source) — every reference page is generated from the markdown sources in bin/_docs_components/. All in-page links are relative, so the site works fine under the /experiments/php-toolkit/ prefix without any path rewrites. Live preview: https://adamziel.github.io/experiments/php-toolkit/ The existing experiments (zfs-wasm, real-zfs, root index) are untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erals
A round of fixes uncovered while previewing the new docs at
adamziel.github.io/experiments/php-toolkit/. Each is small and
localized; together they close the visible gaps between the markdown-
sourced reference pages and the hand-authored html.html / zip.html
they replaced.
* Static fallback for snippets. <php-snippet> relies on a cross-origin
module to register the custom element. Until the module loads (or
if it never does — adblocker, slow network, no-JS), the only
children are <script> blocks, which are display:none. The renderer
now emits an inline <pre class="snippet-fallback"><code> alongside
every snippet; CSS hides it once :defined fires. Readers always see
the code.
* Reverse <\/script> escape on the runtime side. The HTML wrapper
escapes literal "</script" to "<\/script" so the outer
<script type="application/x-php"> isn't closed prematurely, but the
browser preserves the backslash in textContent — so PHP runs with
"<\/script" and WP_HTML_Tag_Processor never finds the close tag.
patchSnippet in page.js now reverses the escape on el._code once
the element registers, so Run produces the same output the author
captured.
* Stop unescaping body HTML in build-reference.py. Three html.unescape()
calls were decoding entities the author intentionally wrote, so
literal-tag examples like <code><p>one<p>two</code>
rendered as real paragraph tags. Removed all three; markdown
sources already store the desired output form.
* split_pitfalls() preserves non-paragraph content. The pitfall
extractor walked body HTML via re.findall + ''.join, dropping
every top-level element that wasn't a <p>...</p>. That destroyed
the "When to use which" tables on html and zip pages and the
"Run the proxy locally" prose+pre on corsproxy. Now uses re.sub
with a replacer so tables, lists, <pre>, and other markup pass
through verbatim.
* See-also entries can target any URL, not just sibling component
slugs. The frontmatter format gains a tiny extension: any first-
field value containing "/" or "." is treated as a verbatim href;
bare slugs still get the .html suffix. Added Tutorial entries to
html, zip, dataliberation, and httpclient — the bridge from
reference back to the learn-path chapters that the hand-authored
pages had.
* Pitfalls ported from the hand-authored pages. The hand-authored
html.html and zip.html had richer Pitfalls sections (4 each); the
initial extraction only carried 1 and 2 respectively. Three HTML
pitfalls (tag closers visited too, uppercase tag names, Tag_Processor
vs full Processor) and two Zip pitfalls (in-place updates, encrypted
archives) are now Footgun: paragraphs in the markdown source.
* sanitize-html.php simplified. The original input produced messy
output (stray spaces from remove_attribute) and was busy enough
to obscure the lesson. Replaced with the cleaner hand-authored
variant.
* HTML / XML literals use heredoc nowdoc delimiters. Multi-line
markup literals in PHP snippets now use <<<'HTML' or <<<'XML',
so the markup keeps its visual structure instead of disappearing
inside `'<...>' . '<...>'` chains. 17 string literals across 11
snippets in blockparser, html, xml, and zip migrated.
Asset version bumped (20260429-rewrite → 20260503-script-unescape).
Verification:
- bin/run-snippets.py --check: 87/87 runnable snippets match their
captured stdout. Workflow `.github/workflows/snippet-tests.yml`
enforces this on every PR.
- python3 bin/build-reference.py: every page renders cleanly.
- Headless chromium screenshots of html.html and zip.html match
the hand-authored variants.
…fall
Final visual-fidelity pass against the hand-authored html.html / zip.html
on trunk found two remaining gaps:
1. Pitfall callouts opened with a lowercase plain-text sentence
("mutations are buffered. Nothing changes…") instead of a bold
short summary ("**Mutations are buffered.** Nothing changes…").
The hand-authored editorial treatment used a leading bold sentence
to act as the callout title.
Footgun: / Gotcha: paragraphs in the markdown source moved their
<strong> from wrapping the prefix to wrapping the lead sentence:
Before: <p><strong>Footgun:</strong> mutations are buffered. …</p>
After: <p>Footgun: <strong>Mutations are buffered.</strong> …</p>
The pitfall extractor strips "Footgun: " (and "Gotcha: ") and emits
the rest of the paragraph verbatim, so the bold lead survives into
the rendered <aside>. Touched: blockparser, bytestream, html, xml,
zip — 11 callouts total.
2. Zip was missing the path-traversal pitfall ("Never extract entry
paths verbatim. Always run paths through ZipDecoder::sanitize_path()")
that the hand-authored page carried as a security note. Ported.
Visual diff: rendered pitfall callouts now match the hand-authored
treatment. All 87 snippets still pass.
Markdown is the source of truth for the docs site (bin/_docs_components/ <slug>.md), so the rendered docs/reference/*.html and docs/assets/ php-toolkit.zip don't need to be tracked. They regenerate cleanly from those sources, and CI already does the regeneration on every push to trunk via .github/workflows/docs.yml. Removed (19 files, 7,550 lines): - docs/reference/<slug>.html for all 18 components - docs/assets/php-toolkit.zip (the Playground bundle, ~9 MB) Added paths to .gitignore so a forgotten local build doesn't sneak back in. Updated bin/serve-docs.py to detect missing artifacts and print the exact build commands to run, and bin/_docs_components/README.md to flag that the HTML output is not content. The hand-authored docs/learn/*.html, docs/index.html, and docs/reference/ index.html stay tracked — they're not derived from the markdown catalog.
4 tasks
adamziel
added a commit
that referenced
this pull request
May 3, 2026
## Summary The runnable docs site at https://wordpress.github.io/php-toolkit/ shipped in #244 and the markdown sources behind it landed in #254. This PR makes it easy to find from every place a reader is likely to land. ## What changed **`github.com/WordPress/php-toolkit`** — Repo description and homepage URL are set via API to `https://wordpress.github.io/php-toolkit/`, so the sidebar on the repo page surfaces the docs link. **Root `README.md`** — A "📚 Live, runnable docs" callout sits at the top, a per-component table replaces the bullet list (each row links to both the README and the matching reference page on the live site), and a "Building the docs site" subsection in the dev workflow points at `bin/build-docs-bundle.sh` + `bin/serve-docs.py`. **`components/<Name>/README.md`** — All 18 component READMEs gain an idempotent banner immediately under the H1, deep-linking to that component's reference page. The banner is wrapped in HTML-comment markers so future URL/wording updates can be re-applied with one script: ```html <!-- docs-site-banner --> > 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/zip.html](...) > Open the page to edit each snippet in your browser and run it in WordPress Playground. <!-- /docs-site-banner --> ``` **`components/<Name>/composer.json`** — Each per-component package (17 of them; `ToolkitCodingStandards` has no composer.json) gains: - `homepage` → the matching reference page - `support: { issues, source, docs }` → links visible on Packagist - shared `keywords: ["wordpress", "php-toolkit"]` The result: someone who lands on https://packagist.org/packages/wp-php-toolkit/zip sees a "Homepage" link to the runnable docs and a "Documentation" link in the support sidebar. **Root `composer.json`** — The `wp-php-toolkit/php-toolkit` meta-package gets the same homepage + support block, plus a description that names the docs URL ("Browse runnable examples at https://wordpress.github.io/php-toolkit/"). **`examples/create-wp-site/README.md`** — Same docs-site banner under the H1 so readers in the examples folder don't miss the docs site. **`AGENTS.md`** — A `Docs site:` pointer added next to the upstream/branch lines, with a one-liner that `bin/_docs_components/<slug>.md` is where the runnable examples live. ## Test plan - [ ] CI passes (no code changes; 87/87 snippets still match). - [ ] After merge, https://github.com/WordPress/php-toolkit shows the homepage URL on the right sidebar. - [ ] After Packagist re-sync (auto on next release, or manual via the package page), each `wp-php-toolkit/*` page surfaces the docs URL under "Homepage" and "Documentation". - [ ] Component READMEs render the banner cleanly on github.com (HTML-comment markers stay invisible). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This was referenced May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds runnable, browseable documentation for every PHP Toolkit component. Each component gets a reference page on the docs site that walks through the API as a sequence of worked examples — short PHP snippets the reader can edit and run in the browser via WordPress Playground, with the captured output already on the page so the result is visible before Playground finishes booting.
Live preview: https://adamziel.github.io/experiments/php-toolkit/
What ships
docs/learn/(Quickstart → 4 chapters → Recap) that introduces the cursor model, streaming archives, content importing, and network primitives in tutorial form.How a component is authored
Per-component markdown under
bin/_docs_components/<slug>.mdis the single source of truth for that component's page. One file owns the lede, the prose, every snippet, every captured stdout, the credit callout, and the see-also list. Editing that file changes:Format:
The format and conventions are documented in
bin/_docs_components/README.md. Component order is controlled by_order.txt.Build pipeline
bin/_load_catalog.pyparses each.mdinto a structured component dict: slug, title, install, lede, credit, see-also, and a list of sections (each with heading, body, optional snippet, optional expected output).bin/build-reference.pyrendersdocs/reference/<slug>.html. Pitfalls (paragraphs starting withFootgun:/Gotcha:) are lifted out of any section into a unified "Pitfalls" callout block; other body markup (tables,<pre>blocks, lists) is preserved verbatim.bin/run-snippets.pyexecutes runnable snippets under PHP 8.3 against the locally-installed toolkit, normalizes noise (tempfile names, hashes, timestamps), and compares stdout to the captured<!-- expected-output -->block alongside it.--updatewrites new captures back into the markdown file in place.bin/_docs_components.pyis a thin module that exposesCOMPONENTS = load_components()plus the small global metadata that doesn't belong in any single component (STARTER_PATHS,COMPONENT_GUIDESfor the landing page).bin/build-docs-bundle.shzips the toolkit source for Playground and rebuilds the reference pages. Local preview:python3 bin/serve-docs.py→ http://localhost:8787.CI: every snippet is verified on every PR
.github/workflows/snippet-tests.ymlrunsbin/run-snippets.py --checkon every pull request. The runner:bin/_docs_components/<slug>.md.<!-- expected-output -->block alongside it.Snippets whose stdout is unstable (real network traffic, timestamps — currently 8 HttpClient examples) live in
run-snippets.py'sNO_EXPECTEDallowlist; they're still required to exit 0, but their output isn't pinned.Current state: 87/87 snippets pass.
Rendering details
<pre class="snippet-fallback">sits inside every<php-snippet>so readers see the code even when the cross-origin Playground module fails to register (CSP, adblock, slow network, no-JS clients). CSS hides the fallback once<php-snippet>:definedfires.<p>stay as entities and the browser shows the literal example inside<code>instead of treating it as real HTML.</script>in snippet code as<\/script>so the wrapping<script type="application/x-php">isn't closed prematurely;page.jsreverses that escape on the runtime side after the custom element registers, so PHP runs with the original</script>literal in string contents.<strong>Mutations are buffered.</strong> …).<<<'HTML'/<<<'XML'heredoc nowdoc delimiters so the markup keeps its visual structure.Test plan
Verify docs snippetsworkflow passes (87/87).bin/_docs_components/<slug>.mdfiles cleanly when browsed on github.com (frontmatter, prose, fenced blocks, expected-output blocks).