Skip to content

docs: fix inaccurate API claims surfaced by an audit pass#257

Merged
adamziel merged 1 commit intotrunkfrom
docs-accuracy-audit
May 3, 2026
Merged

docs: fix inaccurate API claims surfaced by an audit pass#257
adamziel merged 1 commit intotrunkfrom
docs-accuracy-audit

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented May 3, 2026

Summary

I audited every prose claim in bin/_docs_components/<slug>.md against the actual code under components/<Name>/ by running each relevant API in PHP. Most claims hold, but five misstatements in html.md and one in markdown.md would mislead a reader who took them at face value.

What was wrong, what's now correct

html.md

Claim (before) Verified by Fix
"Tag closers are visited too. next_tag() stops on both opening and closing tags." Walked <p>hi</p> with next_tag() — only P was visited, never /P. WP_HTML_Tag_Processor::matches() explicitly skips closing tags unless stop_on_tag_closers is set (class-wp-html-tag-processor.php). Rewrote the pitfall: next_tag() only stops on opening tags; closers and text are skipped, so ! $tags->is_tag_closer() inside a next_tag() loop is harmless but never fires. Pointed readers at next_token() if they need to visit closers.
"If you call get_breadcrumbs() on WP_HTML_Tag_Processor, you'll get a thin shape that doesn't reflect HTML5 tree construction" (new WP_HTML_Tag_Processor('<x>'))->get_breadcrumbs() raises Error: Call to undefined method. The method doesn't exist on Tag_Processor. Rewrote: "doesn't expose get_breadcrumbs() at all — calling that raises a Call to undefined method error. Breadcrumbs and HTML5 tree construction live only on WP_HTML_Processor."
sanitize-html.php snippet contained if ('SCRIPT' === $tags->get_tag() && ! $tags->is_tag_closer()) Same as row 1 — the is_tag_closer() guard is dead code under next_tag(). Output is identical with or without it. Removed the guard, added a one-line comment explaining why no guard is needed.
csp-nonce.php snippet contained the same dead guard Same. Removed.
decode-entities.php example: attribute_starts_with('java&#x09;script:alert(1)', 'javascript:')bool(false), with comment "safe URL prefix check that respects encoded colons (a classic XSS vector)" &#x09; is a TAB, not an encoded colon. The method does not handle whitespace-in-the-middle attacks (browsers do strip ASCII whitespace from URL attributes, but this method does not). The misleading example told readers the function catches java<TAB>script: — it doesn't. Replaced with &#x6A;avascript:alert(1) (entity-encoded j), which the method does correctly recognize as starting with javascript:. Comment updated to describe the actual protection: "decodes character references while comparing."
"When to use which" table row for attribute_starts_with() — same misleading example Same. Updated to: "decodes HTML character references while comparing — so j&#x61;vascript: (where &#x61; is the letter a) is correctly recognized as starting with javascript:."

markdown.md

Claim (before) Verified by Fix
"The result is a BlocksWithMetadata object" (no namespace given) get_class((new MarkdownConsumer('# h'))->consume())WordPress\DataLiberation\DataFormatConsumer\BlocksWithMetadata. A reader who tries use WordPress\Markdown\BlocksWithMetadata gets a missing-class error. Added the canonical namespace inline: "BlocksWithMetadata (defined in WordPress\DataLiberation\DataFormatConsumer — the shared shape every DataFormatConsumer in the toolkit emits)".

What I checked and found accurate

For every component, I verified that:

  • Every use WordPress\… import in a snippet resolves to a real class or interface.
  • Every ->method() call hits a method that actually exists on the class returned by the preceding factory/constructor.
  • Constants referenced in prose (COMPRESSION_NONE = 0, COMPRESSION_DEFLATE = 8, etc.) match getConstant() output.
  • Behavioural claims (get_attribute() returns null for missing attrs, WP_Block_Parser::parse() returns blocks with blockName/attrs/innerBlocks/innerHTML/innerContent keys, loose HTML appears as blockName === null, etc.) match what running the code actually produces.

Components other than html and markdown came back clean: zip, blockparser, xml, filesystem, bytestream, httpclient, dataliberation, encoding, git, merge, polyfill, blueprints, cli, corsproxy, httpserver, coding-standards.

Test plan

  • Verify docs snippets workflow passes: 87/87 snippets still match (the two snippets I changed had their expected-output blocks re-captured automatically).
  • Visual spot-check on https://adamziel.github.io/experiments/php-toolkit/reference/html.html — the rewritten pitfalls and updated decode-entities.php snippet display correctly.
  • Reader sanity check: the is_tag_closer() references in html.md are now consistent — every guard either sits inside a next_token() loop (where it correctly fires) or is removed.

🤖 Generated with Claude Code

I cross-checked every prose claim in bin/_docs_components/<slug>.md
against the actual source under components/<Name>/ by running each
relevant API in PHP. Most claims hold, but five real misstatements in
html.md and one in markdown.md would mislead a reader who took them at
face value. Each is fixed below.

bin/_docs_components/html.md
  * The pitfall "Tag closers are visited too. next_tag() stops on both
    opening and closing tags" was wrong. WP_HTML_Tag_Processor::matches()
    explicitly skips closing tags unless `stop_on_tag_closers` is set
    (class-wp-html-tag-processor.php ~line 4279). So a guard like
    `! $tags->is_tag_closer()` inside a next_tag() loop never fires.
    Rewrote the pitfall to describe the actual behavior and point at
    next_token() for code that wants to see closers.

  * The pitfall "If you call get_breadcrumbs() on it, you'll get a thin
    shape that doesn't reflect HTML5 tree construction" was wrong.
    WP_HTML_Tag_Processor doesn't expose get_breadcrumbs() at all —
    calling it raises `Call to undefined method`. Rewrote the pitfall
    to say so explicitly.

  * sanitize-html.php had `! $tags->is_tag_closer()` inside a next_tag()
    loop (dead code per the corrected pitfall). Removed the guard and
    added a one-line comment explaining why none is needed. Snippet
    output unchanged.

  * csp-nonce.php had the same dead guard. Removed.

  * decode-entities.php demonstrated `attribute_starts_with()` with
    `'java&#x09;script:alert(1)'` (an encoded TAB, not an encoded
    letter) and var_dumped `bool(false)`. The comment said "respects
    encoded colons" but `&#x09;` is a tab, and this method does NOT
    handle ASCII-whitespace bypasses anyway — browsers strip ASCII
    whitespace from URL attributes, but `attribute_starts_with()`
    doesn't. Reader's takeaway from the misleading example was
    "this function catches javascript:-with-tab", which is false.
    Replaced with `'&#x6A;avascript:alert(1)'` (entity-encoded `j`),
    which the method DOES correctly recognize as starting with
    `javascript:` — the actual claim the doc is making.

  * "When to use which" table row for attribute_starts_with() updated
    to match: the function decodes character references, so
    `j&#x61;vascript:` (where `&#x61;` = `a`) is correctly recognized.

bin/_docs_components/markdown.md
  * Claimed "the result is a BlocksWithMetadata object" without noting
    that the class lives in WordPress\DataLiberation\DataFormatConsumer,
    not WordPress\Markdown. A reader who tried `use
    WordPress\Markdown\BlocksWithMetadata` got a missing-class error.
    Added the canonical namespace inline.

Other components audited (zip, blockparser, xml, filesystem, bytestream,
httpclient, dataliberation, encoding, git, merge, polyfill, blueprints,
cli, corsproxy, httpserver, coding-standards) — class names, method
signatures, return shapes, and constants all match the source.

Verified by re-running every snippet with bin/run-snippets.py --check:
87/87 still pass.
@adamziel adamziel merged commit 2a262ad into trunk May 3, 2026
28 of 29 checks passed
@adamziel adamziel deleted the docs-accuracy-audit branch May 3, 2026 19:38
adamziel added a commit that referenced this pull request May 3, 2026
## Summary

Second-round accuracy pass on `bin/_docs_components/<slug>.md`. After
the first pass (#257) fixed six load-bearing inaccuracies in `html.md`
and `markdown.md`, this round drilled deeper by exercising more of the
underlying APIs to make sure the prose still matches reality. One small
markup bug in `xml.md` falls out; everything else holds.

## What got fixed

`bin/_docs_components/xml.md` — two pitfall callouts had unbalanced
`<strong>` tags, an artifact of an earlier "bold the lead sentence"
reformat that didn't notice the existing `<strong>#1:</strong>` prefix.
The rendered HTML had one open and two closes per pitfall, so the bold
span effectively ran from the `#N:` colon through the rest of the
paragraph (browsers forgive the orphan close, but it doesn't render the
way every other pitfall in the catalog does).

```diff
- <p>Footgun: <strong>#1:</strong> namespace-aware methods use the namespace
-   name declared in <code>xmlns</code>, not the prefix written in the tag.</strong>
+ <p>Footgun: <strong>Namespace-aware methods use the namespace URI, not the
+   prefix written in the tag.</strong>
```

Dropped the `#N:` numbering (it added nothing — each Footgun renders as
its own `<aside>`) and rewrote the bold span to wrap a single lead
sentence, matching the style every other pitfall on the site uses.

## What got checked and found accurate

For each component I:

- **Resolved every `use WordPress\…` import** via `class_exists` /
`interface_exists`. Every one resolves.
- **Resolved every `->method()` call** in every snippet via reflection.
Every one exists on the class returned by the preceding
factory/constructor.
- **Ran behavioural spot-checks** on the load-bearing claims:

| Component | Claim | How I checked |
| --- | --- | --- |
| ByteStream | `pull → peek (non-advancing) → consume (advancing)` |
Walked `abcdefghij` through `pull(5)`, `peek(3)` twice, `consume(3)`,
`peek(2)` — semantics match. |
| Polyfill | Filter priority ordering (lower runs first), variadic arg
passing, action capture | Three filters at priorities 5/10/20 → runs 5 →
10 → 20; `do_action('x', 'a', 'b')` → 2-arg callback gets both. |
| Polyfill | `esc_html`, `esc_attr`, `esc_url`, `__()` outputs | All
match prose. |
| Encoding | `wp_is_valid_utf8` accepts `'hello'`, rejects `"\xC3\x28"`;
`wp_scrub_utf8` produces a U+FFFD replacement | All match. |
| Filesystem | `LocalFilesystem::create($root)` chroot — `../escape.txt`
cannot escape | Verified: `../` resolves to root-relative; no file
appears outside `$root`. |
| Filesystem | `SQLiteFilesystem::create(':memory:')` round-trips file
contents | `put_contents('a.txt','data')` then `get_contents('a.txt')`
returns `'data'`. |
| ByteStream | `DeflateReadStream` / `InflateReadStream` round-trip |
550 bytes → 32 compressed → 550 bytes back, byte-for-byte. |
| BlockParser | Block array has exactly five keys: `blockName, attrs,
innerBlocks, innerHTML, innerContent` | Confirmed via `array_keys()`. |
| BlockParser | `blockName === null` for loose HTML between blocks |
Confirmed across three loose-HTML positions (before, between, after). |
| Markdown | `MarkdownConsumer::consume()` returns `BlocksWithMetadata`
| `get_class()` confirms the canonical namespace. |
| Markdown | Round-trip via `MarkdownConsumer` + `MarkdownProducer`
works for headings/lists | Confirmed; output preserves structure, with
only trailing-newline drift. |
| HttpClient | `Request($url, ['method' => 'POST', 'headers' => […]])`
constructor accepts options array | Confirmed. |
| HttpClient | `Response::from_http_headers($raw, $request)` →
`Response` with `status_code` | Confirmed. |
| DataLiberation | `ImportEntity('post', […])` constructor signature;
`WXRWriter::append_entity` exists | Confirmed via reflection. |
| Git | `GitRepository::commit(options)` exists | Confirmed in the
public method list. |
| XML | `XMLProcessor::create_from_string()` parses, walks `next_tag()`;
`get_tag_namespace()` returns the URI not the prefix | Confirmed. |
| Zip | `ZipDecoder::COMPRESSION_NONE = 0`, `COMPRESSION_DEFLATE = 8` |
Confirmed via `ReflectionClass::getConstant()`. |
| CLI | `CLI::parse_command_args_and_options()` exists, returns
`[positionals, options]` | Confirmed. |
| Blueprints | `HumanFriendlySchemaValidator` constructor takes
`$schema` (required), `$options` (optional) | Confirmed. |
| HttpServer | `ResponseWriteStream` is an interface (not a class) —
`use` statements still valid | Confirmed via `interface_exists`. |

## Test plan
- [ ] `Verify docs snippets` workflow passes (87/87).
- [ ] Pitfall callouts in `xml.md` render with a single bold lead,
matching the rest of the catalog.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant