Skip to content

Problems with unicode on windows

Philipp A edited this page Feb 3, 2023 · 2 revisions

R on windows had a problem with Unicode that was fixed in R 4.2.

The problem was that it did strange “escapes” with Unicode chars which are not in the current locale. This happened both when we used parse(code) to execute code (“in”) and when we used capture.output(print(obj)) to print an object to get the plain/text representation (“out”).

The first problem (“in”) could lead to subtle bugs when you specify such a char in the front-end (e.g., a notebook cell) and then used it for comparison with strings which came from elsewhere (e.g., a csv file), because internally, R would compare "法" == "<U+6CD5>" (the "proper" escape for this char would be \u6CD5). If you submitted such code to be executed in the kernel, the R kernel would have sent you warning that you used unicode chars in your code which were automatically converted to such an "escaped" form.

The second problem (“out”) only affected plain/text representation of code and was therefore (per default) only visible when you used print(obj) in a code cell. It was also possible to experience this issue if you disabled the rich display of objects. It only affected what you saw in the front-end and was not used in any computations. As long as you didn’t copy&paste such output and used it to construct new code, you were fine.

Note that the latter was happening any time you do a capture.output and was not specific to the R kernel, but all windows versions of R before 4.2 were affected.

For more information, see the following bugreports:

Clone this wiki locally