Fix: Make seems_utf8 function RFC 3629 compliant#7463
Fix: Make seems_utf8 function RFC 3629 compliant#7463Debarghya-Banerjee wants to merge 4 commits intoWordPress:trunkfrom
seems_utf8 function RFC 3629 compliant#7463Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
|
Hi @desrosj , can you please take a look into this PR. Thanks. |
Trac Ticket: Core-38044
Overview
Key Features:
UTF-8 Encoding Compliance:
The function adheres strictly to the UTF-8 encoding rules defined in RFC 3629, which allows for a maximum of 4 bytes per character.
Handling of Single and Multi-byte Sequences:
Validation of Leading Bytes:
The function checks leading bytes to determine the number of continuation bytes required:
It explicitly rejects any leading bytes starting with 0xF8 or 0xFC, as these indicate sequences that exceed the valid UTF-8 range.
Control Over Overlong Sequences:
Surrogate Pair Handling:
Zero Byte Validation:
Comprehensive Error Handling:
Conclusion