Unicode characters are mangled in JavaScript kata output #307

paul-calvelage · 2016-05-18T05:03:53Z

Unicode characters in the JavaScript kata output are being mangled.

Each byte of the UTF-8 encoding seems to be printing as a separate Unicode character. So the Chinese greeting 你好 displays as ä½ å¥½.

This is very bad for anyone needing more than 7-bit ASCII.

Now for some examples of what I think may be happening. This code in a JavaScript kata:

console.log("£");

displays as the two characters Â£ ("\u00c2\u00a3"). The Unicode code point for £ ("\u00a3") is normally encoded in UTF-8 as 0xc2a3. But Codewars apparently re-encodes each byte: 0xc2, 0xa3 to get Â£.

This:

console.log("\uffff")

is displayed as three characters ï¿¿ ("\u00ef\u00bf\u00bf"). The Unicode code point 0xffff is normally encoded in UTF-8 as 0xefbfbf. But as above, Codewars then seems to re-encode 0xef, 0xbf, 0xbf to ï¿¿.

I could give as many examples as there are multiple-byte UTF-8 encodings, but this suffices to show the pattern for a single character. Longer strings just repeat the problem, so that console.log("£££££"); displays as Â£Â£Â£Â£Â£ for example.

As I said, this seems to be pretty serious for anyone needing Unicode.

Note: I discovered this while completing Simple Change Machine, which uses the pound symbol.

The text was updated successfully, but these errors were encountered:

paul-calvelage · 2016-06-03T08:51:03Z

The Chinese Numeral Encoder kata has lots of Unicode characters, so it is affected by this bug. In this screenshot it is noticeable in the test output.

paul-calvelage · 2016-06-03T08:59:24Z

My previous screenshot was in Chrome on Windows 10. In Firefox I get a slightly different (but still incorrect) output:

kazk · 2017-08-25T07:00:19Z

#902

paul-calvelage mentioned this issue May 27, 2016

HTML special chars gets applied one time too often in dashboard kata excerpt #15

Closed

kazk added the area/test-output label May 28, 2017

kazk closed this as completed Aug 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode characters are mangled in JavaScript kata output #307

Unicode characters are mangled in JavaScript kata output #307

paul-calvelage commented May 18, 2016

paul-calvelage commented Jun 3, 2016

paul-calvelage commented Jun 3, 2016

kazk commented Aug 25, 2017

Unicode characters are mangled in JavaScript kata output #307

Unicode characters are mangled in JavaScript kata output #307

Comments

paul-calvelage commented May 18, 2016

paul-calvelage commented Jun 3, 2016

paul-calvelage commented Jun 3, 2016

kazk commented Aug 25, 2017