Unicode characters in the JavaScript kata output are being mangled.
Each byte of the UTF-8 encoding seems to be printing as a separate Unicode character. So the Chinese greeting 你好 displays as ä½ å¥½.
This is very bad for anyone needing more than 7-bit ASCII.
Now for some examples of what I think may be happening. This code in a JavaScript kata:
displays as the two characters £ ("\u00c2\u00a3"). The Unicode code point for £ ("\u00a3") is normally encoded in UTF-8 as 0xc2a3. But Codewars apparently re-encodes each byte: 0xc2, 0xa3 to get £.
This:
is displayed as three characters ï¿¿ ("\u00ef\u00bf\u00bf"). The Unicode code point 0xffff is normally encoded in UTF-8 as 0xefbfbf. But as above, Codewars then seems to re-encode 0xef, 0xbf, 0xbf to ï¿¿.
I could give as many examples as there are multiple-byte UTF-8 encodings, but this suffices to show the pattern for a single character. Longer strings just repeat the problem, so that console.log("£££££"); displays as £££££ for example.
As I said, this seems to be pretty serious for anyone needing Unicode.
Note: I discovered this while completing Simple Change Machine, which uses the pound symbol.
Unicode characters in the JavaScript kata output are being mangled.
Each byte of the UTF-8 encoding seems to be printing as a separate Unicode character. So the Chinese greeting
你好displays asä½ å¥½.This is very bad for anyone needing more than 7-bit ASCII.
Now for some examples of what I think may be happening. This code in a JavaScript kata:
displays as the two characters
£("\u00c2\u00a3"). The Unicode code point for£("\u00a3") is normally encoded in UTF-8 as0xc2a3. But Codewars apparently re-encodes each byte:0xc2,0xa3to get£.This:
is displayed as three characters
ï¿¿("\u00ef\u00bf\u00bf"). The Unicode code point0xffffis normally encoded in UTF-8 as0xefbfbf. But as above, Codewars then seems to re-encode0xef,0xbf,0xbftoï¿¿.I could give as many examples as there are multiple-byte UTF-8 encodings, but this suffices to show the pattern for a single character. Longer strings just repeat the problem, so that
console.log("£££££");displays as£££££for example.As I said, this seems to be pretty serious for anyone needing Unicode.
Note: I discovered this while completing Simple Change Machine, which uses the pound symbol.