Formatting: Strip CJK punctuation from slugs in sanitize_title_with_dashes()
#9701
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Trac ticket: https://core.trac.wordpress.org/ticket/22402
Currently,
sanitize_title_with_dashes()only strips ASCII non-alphanumeric characters from slugs, but preserves multi-byte punctuation marks. This results in non-western (and some western) punctuation appearing in URL slugs as encoded characters.This patch adds common CJK punctuation marks to the existing character blacklist:
Before: "Hello World。" -> slug is
hello-world%e3%80%82After: "Hello World。" -> slug is
hello-worldThe fix follows the existing pattern of explicitly listing problematic characters and only affects the 'save' context.