-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Charset: Rely on new UTF-8 pipeline for mb_strlen() fallback. #9828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
780e7db to
e8a698a
Compare
aafad31 to
63c1de7
Compare
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
8e3f1ea to
9af3d0f
Compare
1c9fd2c to
de25439
Compare
205a4a0 to
3ffa129
Compare
3ffa129 to
a9a9561
Compare
a9a9561 to
7beab43
Compare
The existing polyfill for `mb_strlen()` contains a number of issues leaving plenty of opportunity for improvement. Specifically, the following are all deficiencies: it relies on Unicode PCRE support, assumes input strings are valid UTF-8, splits input strings into an array of character to count them (1,000 at a time, iterating until complete), and entirely gives up when the Unicode support is missing. This patch provides an updated polyfill which will reliably count code points in a UTF-8 string, even in the presence of sequences of invalid bytes. It scans through the input with zero allocations. Additionally, the underlying fallback extends the behavior of `mb_strlen()` to provide character counts for substrings within a larger input without extracting the substring (it can counts characters within a byte offset and length of a larger string). This change improves the reliability of UTF-8 string length calculations and removes behavioral variability based on the runtime system. Developed in #9828 Discussed in https://core.trac.wordpress.org/ticket/63863 See #63863. git-svn-id: https://develop.svn.wordpress.org/trunk@60949 602fd350-edb4-49c9-b593-d223f7449a82
The existing polyfill for `mb_strlen()` contains a number of issues leaving plenty of opportunity for improvement. Specifically, the following are all deficiencies: it relies on Unicode PCRE support, assumes input strings are valid UTF-8, splits input strings into an array of character to count them (1,000 at a time, iterating until complete), and entirely gives up when the Unicode support is missing. This patch provides an updated polyfill which will reliably count code points in a UTF-8 string, even in the presence of sequences of invalid bytes. It scans through the input with zero allocations. Additionally, the underlying fallback extends the behavior of `mb_strlen()` to provide character counts for substrings within a larger input without extracting the substring (it can counts characters within a byte offset and length of a larger string). This change improves the reliability of UTF-8 string length calculations and removes behavioral variability based on the runtime system. Developed in WordPress/wordpress-develop#9828 Discussed in https://core.trac.wordpress.org/ticket/63863 See #63863. Built from https://develop.svn.wordpress.org/trunk@60949 git-svn-id: http://core.svn.wordpress.org/trunk@60285 1a063a9b-81f0-0310-95a4-ce76da25c4cd
The existing polyfill for `mb_strlen()` contains a number of issues leaving plenty of opportunity for improvement. Specifically, the following are all deficiencies: it relies on Unicode PCRE support, assumes input strings are valid UTF-8, splits input strings into an array of character to count them (1,000 at a time, iterating until complete), and entirely gives up when the Unicode support is missing. This patch provides an updated polyfill which will reliably count code points in a UTF-8 string, even in the presence of sequences of invalid bytes. It scans through the input with zero allocations. Additionally, the underlying fallback extends the behavior of `mb_strlen()` to provide character counts for substrings within a larger input without extracting the substring (it can counts characters within a byte offset and length of a larger string). This change improves the reliability of UTF-8 string length calculations and removes behavioral variability based on the runtime system. Developed in WordPress/wordpress-develop#9828 Discussed in https://core.trac.wordpress.org/ticket/63863 See #63863. Built from https://develop.svn.wordpress.org/trunk@60949 git-svn-id: https://core.svn.wordpress.org/trunk@60285 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Trac ticket: Core-63863
See:
#9825,#9830,#9498,#9826,#9827, #9798, (#9828),#9829Update the polyfill of
mb_strlen()to rely on the new UTF-8 pipeline.