Skip to content

Conversation

@dmsnell
Copy link
Member

@dmsnell dmsnell commented Sep 8, 2025

Trac ticket: Core-63863
See: #9825, #9830, #9498, #9826, #9827, (#9798), #9828, #9829

Relies on the new UTF-8 pipeline to eliminate the use of preg_match() in the HTML API.

@github-actions
Copy link

github-actions bot commented Sep 8, 2025

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@dmsnell dmsnell force-pushed the utf8/html-api-updates branch 3 times, most recently from 25e7519 to b7a3e5e Compare October 16, 2025 23:21
@dmsnell dmsnell force-pushed the utf8/html-api-updates branch 14 times, most recently from f8623e9 to eea5d1f Compare October 21, 2025 00:34
The only PCRE in the HTML API was used to validate a given attribute
name when setting an attribute.

This change relies on the new UTF-8 `wp_has_noncharacters()` method,
removing the reliance on the PCRE extension and unifying behaviors
across PHP runtime environments.
@dmsnell dmsnell force-pushed the utf8/html-api-updates branch from eea5d1f to 7f3fea4 Compare October 21, 2025 03:08
@dmsnell dmsnell marked this pull request as ready for review October 21, 2025 03:08
@github-actions
Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

pento pushed a commit that referenced this pull request Oct 21, 2025
The HTML API has relied upon a single PCRE to determine whether to allow setting certain attribute names. This was because those names aren’t allowed to contain Unicode noncharacters, but detecting noncharacters without a UTF-8 parser is nontrivial.

In this change the direct PCRE has been replaced with a number of `strcpn()` calls and a call to the newer `wp_has_noncharacters()` function. Under the hood, this function will still defer to a PCRE if Unicode support is available, but otherwise will fall back to the UTF-8 pipeline in Core.

This change removes the platform variability, making the HTML API more reliable when Unicode support for PCRE is lacking.

Developed in #9798
Discussed in https://core.trac.wordpress.org/ticket/63863

See #63863.


git-svn-id: https://develop.svn.wordpress.org/trunk@61003 602fd350-edb4-49c9-b593-d223f7449a82
@dmsnell
Copy link
Member Author

dmsnell commented Oct 21, 2025

Merged in 399411b
[61003]

@dmsnell dmsnell closed this Oct 21, 2025
@dmsnell dmsnell deleted the utf8/html-api-updates branch October 21, 2025 03:49
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Oct 21, 2025
The HTML API has relied upon a single PCRE to determine whether to allow setting certain attribute names. This was because those names aren’t allowed to contain Unicode noncharacters, but detecting noncharacters without a UTF-8 parser is nontrivial.

In this change the direct PCRE has been replaced with a number of `strcpn()` calls and a call to the newer `wp_has_noncharacters()` function. Under the hood, this function will still defer to a PCRE if Unicode support is available, but otherwise will fall back to the UTF-8 pipeline in Core.

This change removes the platform variability, making the HTML API more reliable when Unicode support for PCRE is lacking.

Developed in WordPress/wordpress-develop#9798
Discussed in https://core.trac.wordpress.org/ticket/63863

See #63863.

Built from https://develop.svn.wordpress.org/trunk@61003


git-svn-id: http://core.svn.wordpress.org/trunk@60339 1a063a9b-81f0-0310-95a4-ce76da25c4cd
github-actions bot pushed a commit to platformsh/wordpress-performance that referenced this pull request Oct 21, 2025
The HTML API has relied upon a single PCRE to determine whether to allow setting certain attribute names. This was because those names aren’t allowed to contain Unicode noncharacters, but detecting noncharacters without a UTF-8 parser is nontrivial.

In this change the direct PCRE has been replaced with a number of `strcpn()` calls and a call to the newer `wp_has_noncharacters()` function. Under the hood, this function will still defer to a PCRE if Unicode support is available, but otherwise will fall back to the UTF-8 pipeline in Core.

This change removes the platform variability, making the HTML API more reliable when Unicode support for PCRE is lacking.

Developed in WordPress/wordpress-develop#9798
Discussed in https://core.trac.wordpress.org/ticket/63863

See #63863.

Built from https://develop.svn.wordpress.org/trunk@61003


git-svn-id: https://core.svn.wordpress.org/trunk@60339 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant