Description
For most languages, $smcFunc['strtoupper'], $smcFunc['strtolower'], $smcFunc['ucwords'], and $smcFunc['ucfirst'] work fine. But there are some cases where they do the wrong thing.
The most problematically affected are Turkish, Lithuanian, and Greek:
- Turkish has special rules for case mapping between
ı,i,I, andİ. - Lithuanian has special rules for the dots above
iandjwhen the characters also have accent marks above them. - Greek requires special handling for lower case sigma, using
σwhen it occurs in the middle of a word, butςat the end of a word.
Additionally, $smcFunc['ucwords'] and $smcFunc['ucfirst'] do the wrong thing when handling characters that have separate upper case vs. title case versions. They also get easily confused when they encounter things like punctuation marks. (These problems aren't as big a deal as the ones above, but if we can fix them we might as well.)
The underlying cause of these problems is that we rely on the mbstring extension to do these job for us, but the mbstring extension is not locale-aware and therefore never applies the special rules that are needed in these situations.
The ideal solution would be to use the intl extension's functions for these tasks, but many hosts don't have that extensions installed, so we'll need to use our polyfills in Subs-Charset.php when it isn't. I'll also need to expand those polyfills a bit, since they also currently mishandle a few of these situations.
Probably related: https://www.simplemachines.org/community/index.php?topic=581362.0, although some of the details in that report remain unclear due to communication barriers.