Foreign characters in R figures don't work #17

andrewheiss · 2014-07-01T02:09:02Z

Knitr is apparently really picky about the encoding of the files it builds. If you try to build a file with Unicode characters in a plot using this plugin, R will choke on the characters and return them either as .. or their Unicode code.

Here's a minimal working example (.Rmd):


---
title: "Test"
output: html_document

---

Testing

```{r, echo=FALSE}
plot(cars, main="pučina")
```

Running this with this plugin in ST3 will result in the following error:

Warning message:
In native_encode(text) :
  some characters may not work under the current locale

According to this, knitr can have the file encoding passed through the knit() command, but it has to match the encoding of either the file itself or the system default. Hardcoding knit(…, encoding='UTF8') into this plugin's build system isn't recommended, since Windows doesn't play well with UTF8 (apparently) and since it's supposed to match the encoding of the file. Or something.

RStudio gets it right, but that's in part because they've hired Yihui :)

Any ideas on how to run the correct knit() command from ST?

The text was updated successfully, but these errors were encountered:

ghost · 2014-07-01T05:17:57Z

Try adding the env variable to the .sublime-build. @randy3k suggested it to me, along with other possible solutions, here regarding my own, quite related, encoding issues involving knitr and Sublime builds, and it has worked like a charm. My own .sublime-build variant now looks like this:

  "variants":
  [
  {
    "name": "Run",
    "working_dir": "$file_path",
    "env": { "LANG": "en_US.UTF-8" },
    "shell_cmd": "Rscript -e \"rmarkdown::render(input = '$file')\""
  }
  ]

With this, I am able to successfully rmarkdown::render() your example, although I do get a few warnings in the rendered document:

Trying a simple knit() after having added the same env variable to SublimeKnitr's default .sublime-build also seems to sort of work, printing the same warnings in the resulting document:

---
title: "Test"
output: html_document
---

Testing


```
## Warning: conversion failure on 'pučina' in 'mbcsToSbcs': dot substituted for <c4>
## Warning: conversion failure on 'pučina' in 'mbcsToSbcs': dot substituted for <8d>
```

![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1.png)

ghost · 2014-07-01T05:19:02Z

I guess I should mention that I'm on a Mac; I'm not really sure if this is relevant for folk on Windows.

andrewheiss · 2014-07-01T16:15:09Z

Ooh, this looks promising. I've been toying around with it for the past hour, trying to get rid of the conversion failure warnings, but to no avail. It's a common problem for R graphics and knitr apparently (see the Encoding of multibyte characters section at the knitr manual). It looks like you can take care of the problem by manually specifying an encoding, but there's no UTF-8 encoding (apparently), so I don't know how to best generalize it. I'd love to know how RStudio does it.

ghost · 2014-07-02T02:24:59Z

Try adding "env": { "LANG": "en_US.UTF-8" } to the default .sublime-build and adding the following chunk before the chunk included in your test document:

```{r, echo = FALSE}
pdf.options(encoding = 'CP1250')
```

How does that work? It seems to have gotten rid of the conversion warnings for me. Cf. this question on Stack Overflow.

ghost · 2014-07-02T02:29:34Z

Using encoding = 'ISOLatin2', instead of encoding = 'CP1250', also seems to work for me.

andrewheiss · 2014-07-02T03:10:02Z

Fantastic - that works!

The only downside to this is that the user has to select an encoding that fits all the characters they're using in their document. If they use Chinese, Arabic, or Cyrillic characters, they'll need to change it accordingly.

andrewheiss · 2014-07-02T03:12:55Z

However, I just tested it in RStudio and it has the same problem (and same solution; setting pdf.options() in a chunk). So RStudio doesn't have a magic way to make this work—it's subject to the same encoding wonkiness in PDF images.

andrewheiss · 2014-07-02T03:33:36Z

So, for future reference, adding a separate block with pdf.options() will work. Here's a minimal working example:

---
title: "Test"
output: html_document
---

Testing

```{r, echo=FALSE}
pdf.options(encoding='ISOLatin2')
```

```{r, echo=FALSE}
plot(cars, main="pučina")
```

ghost · 2014-07-02T04:22:02Z

Maybe this should be a separate issue, or maybe even this enters more into the jurisdiction of @LaTeXing, but it is directly related to the foregoing discussion, so I'll just add it here for the moment.

The solution above for Rmd documents does not seem to work for Rtex/Rnw/etc., where "č" and other non-English characters are rendered as ".." or as Unicode; admittedly, I have yet to manage to successfully incorporate the env variable into the .sublime-build.

Input:

\documentclass{article}

\title{Test}
\date{}

%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode

\begin{document}

\maketitle

Testing

%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode

%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode

\end{document}

Output:

andrewheiss · 2014-07-02T04:26:42Z

Yes, this.

andrewheiss · 2014-07-02T04:30:53Z

I've been working with another person (not on GitHub) with this exact issue (..s in .Rnw files). He asked a SO question and got an answer that said he should use Cairo, but it's a clunky solution and renders PDFs differently.

However, I don't know if this is a knitr issue. When he runs knitr from the Terminal, everything works great and all characters show up as expected. Building the .Rnw file from ST is where encoding messes up. Perhaps adding "env": { "LANG": "en_US.UTF-8" }, to the LaTeXTools or LaTeXing build systems will make it work right?

ghost · 2014-07-02T04:55:36Z

I think you may be right about the issue being due to ST rather than to knitr, although I don't know much at all. In my encoding-related question on SO, @randy3k in a comment suggests I run:

import subprocess; print(subprocess.check_output("R -q -e 'Sys.getlocale()'", shell=True).decode('utf8'))

in ST's console and comparing the results with those gleaned from running, in the terminal:

R -q -e 'Sys.getlocale()'

It seems that, for me at least, there is some sort of disconnect (but, again, I don't know much on the subject): ST yields "C", while my terminal gives me "C/UTF-8/C/C/C/C".

Adding "env": { "LANG": "en_US.UTF-8" }, to my .sublime-build variant for .Rmd subsequently fixed that issue, for which reason I have indeed tried repeatedly to add it to @LaTeXing's .sublime-build. However, probably due to my own ineptitude, doing so has only resulted in a broken .sublime-build, i.e., that does nothing but save the open file (no compile, no knit, etc.).

andrewheiss · 2014-07-02T20:08:58Z

My terminal gives me [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8", while ST gives just [1] "C".

But after creating ~/.Renviron and adding LANG=en_US.UTF-8, ST gives [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

Try doing that and see if the .. problem persists.

ghost · 2014-07-02T20:40:15Z

Adding LANG=en_US.UTF-8 to ~/.Renviron seems to have mixed results for me¹: "č" is rendered nicely without any warnings in the output .pdf, while "¡" and "é" are simply omitted, i.e., the Unicode code is no longer printed.

Input:

\documentclass{article}

\title{Test}
\date{}

%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode

\begin{document}

\maketitle

Testing. !` \'e

%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode

%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode

\end{document}

Output:

¹ Running the bit of Python in ST gives me [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8" as well.

ghost · 2014-07-02T20:45:23Z

Compare with the rendered .html from .Rmd:

---
title: "Test"
output: html_document
---

Testing. ¡ é

```{r, 'set-up', include = FALSE}
pdf.options(encoding='ISOLatin2')
```

```{r, 'test_1', echo = FALSE}
plot(cars, main="pučina")
```

```{r, 'test_2'}
print('¡Qué tranza o qué!')
```

andrewheiss · 2014-07-02T20:50:53Z

Oh, we're so close :)

The missing characters in the actual body of the PDF is probably due to LaTeX. Add this to the preamble: \usepackage[utf8]{inputenc}

ghost · 2014-07-02T21:31:29Z

That did it! Thanks very much.

\documentclass{article}
\usepackage[utf8]{inputenc} 

\title{Test}
\date{}

%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode

\begin{document}

\maketitle

Testing. !`¡\'eé

%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode

%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode

\end{document}

ghost · 2014-07-02T21:36:54Z

Summary

.Rmd, .Rnw/.Rtex:
- Adding:
  - "env": { "LANG": "en_US.UTF-8" }, to the .sublime-build;
  - and, in the .Rmd or .Rnw, a separate, preliminary chunk with pdf.options(encoding = '<encoding>')
  allows for error and warning -free use of multibyte characters in graphics; run list.files(system.file('enc', package = 'grDevices')) in R for available encodings
.Rnw/.Rtex exclusively:
- As may be self-evident, including \usepackage[utf8]{inputenc} in the document header is necessary in order to successfully render multibyte characters in the text, be it knitted R output (which one doesn't necessarily see in the source document) or other text
Also:
- Depending on one's system set-up, adding LANG=en_US.UTF-8 to ~/.Renviron may be necessary in order for knitting done through Sublime Text to be encoding-error-free

andrewheiss · 2014-07-02T22:29:10Z

Thanks so much for your help!

randy3k · 2014-07-03T01:06:04Z

very interesting discussion.

randy3k · 2014-07-03T01:24:37Z

Another possible way to suppress the warnings is to use another graphic device, e.g.,

<<include = FALSE>>=
options(device = "cairo_pdf")
@

andrewheiss · 2014-07-03T01:25:28Z

Yes, though I had someone else complain that the Cairo output wasn't as clear or nice looking as whatever R's default is.

randy3k · 2014-07-03T01:49:56Z

@mmarascio
it is strange that "env": { "LANG": "en_US.UTF-8" } in sublime-build doesn't work for you but
adding LANG=en_US.UTF-8 to ~/.Renviron works.
I believe that they should be the same, at least in sublime environment. May be I am wrong.

ghost · 2014-07-03T02:13:36Z

@randy3k: I'm not sure I understand; both alternatives do seem to work for me (see this relevant comment). Only, in addition, for non-ASCII characters in R plots, I need the preliminary chunk that sets pdf.options and, for non-ASCII characters in knitr output in .Rnw - as should've been evident to me - I need \usepackage[utf8]{inputenc} in the document's preamble.

randy3k · 2014-07-03T02:28:23Z

I see. Thx for the clarification.

andrewheiss added a commit that referenced this issue Jul 2, 2014

Better encoding support + rmarkdown::render. Fixes #15 and #17

dd06ddc

andrewheiss closed this as completed Jul 2, 2014

andrewheiss reopened this Jul 2, 2014

andrewheiss closed this as completed in 4231ba8 Jul 2, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foreign characters in R figures don't work #17

Foreign characters in R figures don't work #17

andrewheiss commented Jul 1, 2014

ghost commented Jul 1, 2014

ghost commented Jul 1, 2014

andrewheiss commented Jul 1, 2014

ghost commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

randy3k commented Jul 3, 2014

randy3k commented Jul 3, 2014

andrewheiss commented Jul 3, 2014

randy3k commented Jul 3, 2014

ghost commented Jul 3, 2014

randy3k commented Jul 3, 2014

Foreign characters in R figures don't work #17

Foreign characters in R figures don't work #17

Comments

andrewheiss commented Jul 1, 2014

ghost commented Jul 1, 2014

ghost commented Jul 1, 2014

andrewheiss commented Jul 1, 2014

ghost commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

ghost commented Jul 2, 2014

andrewheiss commented Jul 2, 2014

ghost commented Jul 2, 2014

ghost commented Jul 2, 2014

Summary

andrewheiss commented Jul 2, 2014

randy3k commented Jul 3, 2014

randy3k commented Jul 3, 2014

andrewheiss commented Jul 3, 2014

randy3k commented Jul 3, 2014

ghost commented Jul 3, 2014

randy3k commented Jul 3, 2014