-
Notifications
You must be signed in to change notification settings - Fork 13
Use wp_is_valid_utf8() and wp_scrub_utf8() from the new utf8.php decoder #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…er wont autoload them
| } else { | ||
| $is_valid_utf8 = ! _wp_has_noncharacters_fallback( $blueprint_string ); | ||
| } | ||
| $is_valid_utf8 = ! wp_has_noncharacters( $blueprint_string ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a note in the other PR making this change, but it’s possible there is a misunderstanding here between valid UTF-8 and noncharacters, as noncharacters are valid UTF-8, and invalid UTF-8 cannot form noncharacters.
there is, however, wp_is_valid_utf8()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, I mixed it all up. Thank you for catching this.
|
|
Speeds up XMLProcessor by consuming any ASCII bytes with `strspn` and avoiding calls to the utf8 decoder for most tags out there. The PHPUnit test suite for WXR files It speeds up parsing the 10MB WXR file in the test set from ~1.7s on average to ~0.6s on average. This PR also moves from `utf8_codepoint_at` to `_wp_scan_utf8` for UTF-8 decoding without any speed penalty – see #200 for prior context. cc @dmsnell
Adds support for rewriting URLs inside CSS syntax, e.g. here: ```html <div style="background-image:url(/wp-content/uploads/2025/09/image-2-766x1024.jpeg)"> ``` Before this PR, the `style` attributes in, e.g., the cover block were skipped by the URL rewriter and continued pointing to the old site. Fixes #223 ## Implementation details This PR backports `CSSProcessor`, `CSSURLProcessor`, and a few related PRs around Unicode handling from the WordPress/php-toolkit repo: * WordPress/php-toolkit#197 * WordPress/php-toolkit#195 * WordPress/php-toolkit#199 * WordPress/php-toolkit#200 * WordPress/php-toolkit#201 * WordPress/php-toolkit#202 Note the CSSProcessor and CSSURLProcessor are tested against 300 test cases containing various tricky inputs, quoted and unquoted URLs, strings, comments, unicode escape sequences, and more. ## Testing instructions This PR comes with a new test case specifically for various tricky CSS inputs. You're also welcome to try and import a WXR file that contains an inline background-image reference and confirm the URL is correctly rewritten.
Replaces two instances of the old UTF-8 decoding utilities with the new utf-8.php toolkit by @dmsnell:
This PR only touches two tactical usages of the old tools:
wp_is_valid_utf8wp_scrub_utf8instead of_wp_scrub_utf8_fallbackMore refactoring is coming once there's a faster alternative to
_wp_scan_utf8, see https://core.trac.wordpress.org/ticket/63863#comment:51Related to #196.
Follows up on #199 and #197.
Testing instructions
If the CI passes, we're good. Unicode-related scenarios are covered by tests.