Please sign in to comment.
Improve fallback UTF-8 locale detection
If the libc doesn't have modern enough routines, we use a fallback mechanism to see if a locale is UTF-8 or not. One component of this is to look at the byte sequence for the currency symbol. Obviously, if the sequence isn't valid UTF-8, the locale isn't either. But if it is valid UTF-8, and hence might be a UTF-8 locale, this commit changes the detection mechanism to see if the sequence evaluates, when interpreted as UTF-8 to be a known Unicode currency symbol. If so, the locale must be UTF-8, as the odds of some other locale having a sequence that does this are vanishingly small. If the sequence doesn't evaluate to a currency symbol, that doesn't tell us anything, as plenty of places have a string of letters be their currency symbol. Nor if the symbol is a '$', as that is invariant under UTF-8 vs not, so doesn't help us. This pretty much guarantees that a UTF-8 locale for the European Union or the UK that otherwise looks like plain English (Latin script) will be properly determined to be UTF-8, as the symbols for their currencies will pass this test.
- Loading branch information...
Showing with 61 additions and 23 deletions.