Skip to content

Expose public API entry points from native classes#276

Merged
adamziel merged 5 commits into
trunkfrom
codex-native-public-api-surface
May 17, 2026
Merged

Expose public API entry points from native classes#276
adamziel merged 5 commits into
trunkfrom
codex-native-public-api-surface

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented May 17, 2026

What it does

Adds the native extension class layer that the PHP integration PR can bind to directly.

This PR now implements the public entry points in Rust instead of leaving PHP wrappers to fill behavior back in:

  • WP_HTML_Native_Tag_Processor::next_tag() accepts the public string|array|null query shape, including tag_name, class_name, match_offset, and tag_closers.
  • WP_HTML_Native_Processor::next_tag() accepts public tag, class, match-offset, and breadcrumb queries.
  • WP_HTML_Native_Processor::normalize() and serialize() run through native serialization instead of calling back into the public PHP class, avoiding recursion once the public PHP class resolves to the native implementation.
  • WordPress\XML\NativeXMLProcessor::next_tag() accepts the public local-name, namespace/local-name, match-offset, and namespaced breadcrumb query shapes.
  • WordPress\DataLiberation\URL\NativeURLInTextProcessor::get_parsed_url() delegates to WPURL::parse() so callers receive the component URL object surface instead of a placeholder string.
  • URL-in-text filtering runs in Rust: base-protocol handling, HTTP(S) filtering, credential rejection, public-suffix checks for bare domains, invalid-port rejection, and replacement conflict handling.
  • Adds a Native APIs GitHub Actions workflow that builds the PHP extension, loads it, and runs extensions/native-apis/tests/verify-native-apis.php.

Rationale

The stacked PHP PR should only decide whether to expose native or PHP implementations. If public behavior lives in PHP wrapper classes, reviewers have to audit two implementations and the wrappers grow into a second parser layer.

This PR moves the public API burden into the extension so the next PR can use empty native subclasses plus fallback loaders.

Implementation

The native classes expose a supports_public_api() marker. PHP loaders in the next PR use that marker to avoid stale extension builds that do not support this public surface.

The verifier now covers representative public calls:

$processor->next_tag( array( 'tag_name' => 'p', 'class_name' => 'target' ) );
$html->next_tag( array( 'breadcrumbs' => array( 'ARTICLE', 'IMG' ) ) );
$xml->next_tag( array( 'breadcrumbs' => array( array( 'https://wordpress.org', 'item' ) ) ) );
$url->get_parsed_url()->hostname;

Native HTML serialization normalizes opening-tag attributes from the original source bytes, preserving first-duplicate-wins behavior while quoting unquoted values and retaining boolean attributes.

Unsupported factory options still return null for now, keeping native opt-in conservative while the public surface is filled in.

Testing instructions

Local checks run:

cd extensions/native-apis
cargo fmt && cargo test
php -l extensions/native-apis/tests/verify-native-apis.php
bash -n extensions/native-apis/build-extension.sh
/home/claude/php-toolkit/vendor/bin/phpcs -d memory_limit=1G /tmp/php-toolkit-public-api/extensions/native-apis/tests/verify-native-apis.php
git diff --check

Local limitation: this machine does not have php-config, so cargo check --features php-extension cannot run locally. The new Native APIs / Build and verify PHP extension workflow is the required extension build/load check for this PR.

@adamziel adamziel merged commit 31f1fcb into trunk May 17, 2026
29 checks passed
@adamziel adamziel deleted the codex-native-public-api-surface branch May 17, 2026 09:58
adamziel added a commit that referenced this pull request May 17, 2026
## What it does

Changes the public PHP classes to progressively use the native extension
with no API-consumer burden.

When the extension is installed and the native class advertises
`supports_public_api()`, the public PHP class resolves to the native
implementation. Otherwise it falls back to the moved PHP implementation.

Implemented wiring:

| Component | PHP integration in this PR |
| --- | --- |
| HTML | `WP_HTML_Tag_Processor` and `WP_HTML_Processor` load either
empty native adapter subclasses or the moved PHP classes. |
| XML | `XMLProcessor` loads either `NativeXMLProcessor` directly or
`PHPXMLProcessor`. The old cursor bridge is reduced to an empty native
subclass. |
| URL-in-text | `URLInTextProcessor` loads either the native class
through an empty adapter or `PHPURLInTextProcessor`. |
| Data Liberation HTML | Keeps the public subclass usable with the same
native/PHP processor selection model instead of forcing the PHP path. |

The native wrapper files are intentionally tiny:

-
`components/DataLiberation/URL/class-nativeurlintextprocessorwrapper.php`:
5 lines
- `components/HTML/class-wp-html-native-processor-wrapper.php`: 9 lines
- `components/HTML/class-wp-html-native-tag-processor-wrapper.php`: 9
lines
- `components/XML/class-xmlnativecursorprocessor.php`: 5 lines

## Rationale

The public class names should remain stable for consumers. Installing
the native extension should be a progressive upgrade, not a new API they
have to opt into.

This PR keeps PHP responsible for loading and fallback only. Parser
behavior belongs to the Rust classes already merged from #276 or to the
moved PHP fallback classes.

## Implementation

The moved PHP implementations live under component `PHP/` paths. Public
loader files choose the native class only when all of these are true:

```php
! defined( 'WP_NATIVE_APIS_DISABLE_DEFAULTS' ) || ! WP_NATIVE_APIS_DISABLE_DEFAULTS
class_exists( $native_class, false )
method_exists( $native_class, 'supports_public_api' )
$native_class::supports_public_api()
```

There are no per-component `WP_NATIVE_APIS_ENABLE_*` switches. Tests and
benchmarks can use the single global opt-out constant when they need to
force fallback behavior.

The benchmark harness now tolerates unsupported PHP fallback rows
instead of aborting the run, while `--require-native` still fails if any
native implementation row is unavailable.

## Benchmarks

Benchmark run:
https://github.com/WordPress/php-toolkit/actions/runs/25988013187

Command:

```bash
php -d extension=extensions/native-apis/target/release/libwp_native_apis.so \
  bin/benchmark-native-apis.php \
  --iterations=100 \
  --mode=both \
  --disable-native-defaults \
  --require-native
```

Representative PHP-fallback vs native rows:

| Workload | PHP wall (s) | Native wall (s) | Speedup |
| --- | ---: | ---: | ---: |
| `html-tag-processor` | 0.240940 | 0.035885 | 6.71x |
| `html-processor` | 1.113778 | 0.135704 | 8.21x |
| `xml-processor` | 0.332820 | 0.063290 | 5.26x |
| `url-in-text-processor` | 3.853847 | 0.030515 | 126.29x |

The same run also completed all native fused/chunked benchmark rows for
HTML, XML, and URL-in-text with `--require-native` enabled.

## Testing instructions

Local checks run for the benchmark harness fixes:

```bash
php -l bin/benchmark-native-apis.php
```

Previously run locally before the #276 rebase:

```bash
php -l components/HTML/class-wp-html-native-tag-processor-wrapper.php
php -l components/HTML/class-wp-html-native-processor-wrapper.php
php -l components/DataLiberation/URL/class-nativeurlintextprocessorwrapper.php
php -l components/XML/class-xmlnativecursorprocessor.php
php -l components/XML/class-xmlprocessor.php
php -l components/HTML/class-wp-html-tag-processor.php
php -l components/HTML/class-wp-html-processor.php
php -l components/DataLiberation/URL/class-urlintextprocessor.php
vendor/bin/phpunit components/HTML/Tests/NativeHTMLConformanceTest.php
vendor/bin/phpunit components/XML/Tests/NativeXMLConformanceTest.php
vendor/bin/phpunit components/DataLiberation/Tests/URLInTextProcessorTest.php components/DataLiberation/Tests/URLInTextProcessorWHATWGComplianceTest.php components/DataLiberation/Tests/WPURLTest.php
vendor/bin/phpcs --standard=phpcs.xml components/DataLiberation/URL/class-nativeurlintextprocessorwrapper.php components/DataLiberation/URL/class-urlintextprocessor.php components/HTML/class-wp-html-native-processor-wrapper.php components/HTML/class-wp-html-native-tag-processor-wrapper.php components/HTML/class-wp-html-processor.php components/HTML/class-wp-html-tag-processor.php components/XML/class-xmlnativecursorprocessor.php components/XML/class-xmlprocessor.php components/HTML/Tests/NativeHTMLConformanceTest.php components/XML/Tests/NativeXMLConformanceTest.php
git diff --check
```

GitHub Actions is the authoritative extension build/load check for this
PR because this local machine does not have `php-config`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant