Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreign characters in R figures don't work #17

Closed
andrewheiss opened this issue Jul 1, 2014 · 25 comments
Closed

Foreign characters in R figures don't work #17

andrewheiss opened this issue Jul 1, 2014 · 25 comments

Comments

@andrewheiss
Copy link
Owner

Knitr is apparently really picky about the encoding of the files it builds. If you try to build a file with Unicode characters in a plot using this plugin, R will choke on the characters and return them either as .. or their Unicode code.

Here's a minimal working example (.Rmd):


---
title: "Test"
output: html_document

---

Testing

```{r, echo=FALSE}
plot(cars, main="pučina")
```

Running this with this plugin in ST3 will result in the following error:

Warning message:
In native_encode(text) :
  some characters may not work under the current locale

According to this, knitr can have the file encoding passed through the knit() command, but it has to match the encoding of either the file itself or the system default. Hardcoding knit(…, encoding='UTF8') into this plugin's build system isn't recommended, since Windows doesn't play well with UTF8 (apparently) and since it's supposed to match the encoding of the file. Or something.

RStudio gets it right, but that's in part because they've hired Yihui :)

Any ideas on how to run the correct knit() command from ST?

@ghost
Copy link

ghost commented Jul 1, 2014

Try adding the env variable to the .sublime-build. @randy3k suggested it to me, along with other possible solutions, here regarding my own, quite related, encoding issues involving knitr and Sublime builds, and it has worked like a charm. My own .sublime-build variant now looks like this:

  "variants":
  [
  {
    "name": "Run",
    "working_dir": "$file_path",
    "env": { "LANG": "en_US.UTF-8" },
    "shell_cmd": "Rscript -e \"rmarkdown::render(input = '$file')\""
  }
  ]

With this, I am able to successfully rmarkdown::render() your example, although I do get a few warnings in the rendered document:

captura de pantalla 2014-06-30 a la s 22 09 03

Trying a simple knit() after having added the same env variable to SublimeKnitr's default .sublime-build also seems to sort of work, printing the same warnings in the resulting document:

---
title: "Test"
output: html_document
---

Testing


```
## Warning: conversion failure on 'pučina' in 'mbcsToSbcs': dot substituted for <c4>
## Warning: conversion failure on 'pučina' in 'mbcsToSbcs': dot substituted for <8d>
```

![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1.png) 

@ghost
Copy link

ghost commented Jul 1, 2014

I guess I should mention that I'm on a Mac; I'm not really sure if this is relevant for folk on Windows.

@andrewheiss
Copy link
Owner Author

Ooh, this looks promising. I've been toying around with it for the past hour, trying to get rid of the conversion failure warnings, but to no avail. It's a common problem for R graphics and knitr apparently (see the Encoding of multibyte characters section at the knitr manual). It looks like you can take care of the problem by manually specifying an encoding, but there's no UTF-8 encoding (apparently), so I don't know how to best generalize it. I'd love to know how RStudio does it.

@ghost
Copy link

ghost commented Jul 2, 2014

Try adding "env": { "LANG": "en_US.UTF-8" } to the default .sublime-build and adding the following chunk before the chunk included in your test document:

```{r, echo = FALSE}
pdf.options(encoding = 'CP1250')
```

How does that work? It seems to have gotten rid of the conversion warnings for me. Cf. this question on Stack Overflow.

@ghost
Copy link

ghost commented Jul 2, 2014

Using encoding = 'ISOLatin2', instead of encoding = 'CP1250', also seems to work for me.

@andrewheiss
Copy link
Owner Author

Fantastic - that works!

The only downside to this is that the user has to select an encoding that fits all the characters they're using in their document. If they use Chinese, Arabic, or Cyrillic characters, they'll need to change it accordingly.

@andrewheiss
Copy link
Owner Author

However, I just tested it in RStudio and it has the same problem (and same solution; setting pdf.options() in a chunk). So RStudio doesn't have a magic way to make this work—it's subject to the same encoding wonkiness in PDF images.

@andrewheiss
Copy link
Owner Author

So, for future reference, adding a separate block with pdf.options() will work. Here's a minimal working example:

---
title: "Test"
output: html_document
---

Testing

```{r, echo=FALSE}
pdf.options(encoding='ISOLatin2')
```

```{r, echo=FALSE}
plot(cars, main="pučina")
```

@ghost
Copy link

ghost commented Jul 2, 2014

Maybe this should be a separate issue, or maybe even this enters more into the jurisdiction of @LaTeXing, but it is directly related to the foregoing discussion, so I'll just add it here for the moment.

The solution above for Rmd documents does not seem to work for Rtex/Rnw/etc., where "č" and other non-English characters are rendered as ".." or as Unicode; admittedly, I have yet to manage to successfully incorporate the env variable into the .sublime-build.

Input:

\documentclass{article}

\title{Test}
\date{}

%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode

\begin{document}

\maketitle

Testing

%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode

%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode

\end{document}

Output:

captura de pantalla 2014-07-01 a la s 21 20 40

@andrewheiss
Copy link
Owner Author

Yes, this.

@andrewheiss andrewheiss reopened this Jul 2, 2014
@andrewheiss
Copy link
Owner Author

I've been working with another person (not on GitHub) with this exact issue (..s in .Rnw files). He asked a SO question and got an answer that said he should use Cairo, but it's a clunky solution and renders PDFs differently.

However, I don't know if this is a knitr issue. When he runs knitr from the Terminal, everything works great and all characters show up as expected. Building the .Rnw file from ST is where encoding messes up. Perhaps adding "env": { "LANG": "en_US.UTF-8" }, to the LaTeXTools or LaTeXing build systems will make it work right?

@ghost
Copy link

ghost commented Jul 2, 2014

I think you may be right about the issue being due to ST rather than to knitr, although I don't know much at all. In my encoding-related question on SO, @randy3k in a comment suggests I run:

import subprocess; print(subprocess.check_output("R -q -e 'Sys.getlocale()'", shell=True).decode('utf8'))

in ST's console and comparing the results with those gleaned from running, in the terminal:

R -q -e 'Sys.getlocale()'

It seems that, for me at least, there is some sort of disconnect (but, again, I don't know much on the subject): ST yields "C", while my terminal gives me "C/UTF-8/C/C/C/C".

Adding "env": { "LANG": "en_US.UTF-8" }, to my .sublime-build variant for .Rmd subsequently fixed that issue, for which reason I have indeed tried repeatedly to add it to @LaTeXing's .sublime-build. However, probably due to my own ineptitude, doing so has only resulted in a broken .sublime-build, i.e., that does nothing but save the open file (no compile, no knit, etc.).

@andrewheiss
Copy link
Owner Author

My terminal gives me [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8", while ST gives just [1] "C".

But after creating ~/.Renviron and adding LANG=en_US.UTF-8, ST gives [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

Try doing that and see if the .. problem persists.

@ghost
Copy link

ghost commented Jul 2, 2014

Adding LANG=en_US.UTF-8 to ~/.Renviron seems to have mixed results for me1: "č" is rendered nicely without any warnings in the output .pdf, while "¡" and "é" are simply omitted, i.e., the Unicode code is no longer printed.

Input:

\documentclass{article}

\title{Test}
\date{}

%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode

\begin{document}

\maketitle

Testing. !` \'e

%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode

%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode

\end{document}

Output:

captura de pantalla 2014-07-02 a la s 13 39 09

1 Running the bit of Python in ST gives me [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8" as well.

@ghost
Copy link

ghost commented Jul 2, 2014

Compare with the rendered .html from .Rmd:

---
title: "Test"
output: html_document
---

Testing. ¡ é

```{r, 'set-up', include = FALSE}
pdf.options(encoding='ISOLatin2')
```

```{r, 'test_1', echo = FALSE}
plot(cars, main="pučina")
```

```{r, 'test_2'}
print('¡Qué tranza o qué!')
```

captura de pantalla 2014-07-02 a la s 13 42 06

@andrewheiss
Copy link
Owner Author

Oh, we're so close :)

The missing characters in the actual body of the PDF is probably due to LaTeX. Add this to the preamble: \usepackage[utf8]{inputenc}

@ghost
Copy link

ghost commented Jul 2, 2014

That did it! Thanks very much.

\documentclass{article}
\usepackage[utf8]{inputenc} 

\title{Test}
\date{}

%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode

\begin{document}

\maketitle

Testing. !`¡\'eé

%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode

%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode

\end{document}

captura de pantalla 2014-07-02 a la s 13 58 59

@ghost
Copy link

ghost commented Jul 2, 2014

Summary

  • .Rmd, .Rnw/.Rtex:
    • Adding:

      • "env": { "LANG": "en_US.UTF-8" }, to the .sublime-build;
      • and, in the .Rmd or .Rnw, a separate, preliminary chunk with pdf.options(encoding = '<encoding>')

      allows for error and warning -free use of multibyte characters in graphics; run list.files(system.file('enc', package = 'grDevices')) in R for available encodings

  • .Rnw/.Rtex exclusively:
    • As may be self-evident, including \usepackage[utf8]{inputenc} in the document header is necessary in order to successfully render multibyte characters in the text, be it knitted R output (which one doesn't necessarily see in the source document) or other text
  • Also:
    • Depending on one's system set-up, adding LANG=en_US.UTF-8 to ~/.Renviron may be necessary in order for knitting done through Sublime Text to be encoding-error-free

@andrewheiss
Copy link
Owner Author

Thanks so much for your help!

@randy3k
Copy link

randy3k commented Jul 3, 2014

very interesting discussion.

@randy3k
Copy link

randy3k commented Jul 3, 2014

Another possible way to suppress the warnings is to use another graphic device, e.g.,

<<include = FALSE>>=
options(device = "cairo_pdf")
@

@andrewheiss
Copy link
Owner Author

Yes, though I had someone else complain that the Cairo output wasn't as clear or nice looking as whatever R's default is.

@randy3k
Copy link

randy3k commented Jul 3, 2014

@mmarascio
it is strange that "env": { "LANG": "en_US.UTF-8" } in sublime-build doesn't work for you but
adding LANG=en_US.UTF-8 to ~/.Renviron works.
I believe that they should be the same, at least in sublime environment. May be I am wrong.

@ghost
Copy link

ghost commented Jul 3, 2014

@randy3k: I'm not sure I understand; both alternatives do seem to work for me (see this relevant comment). Only, in addition, for non-ASCII characters in R plots, I need the preliminary chunk that sets pdf.options and, for non-ASCII characters in knitr output in .Rnw - as should've been evident to me - I need \usepackage[utf8]{inputenc} in the document's preamble.

@randy3k
Copy link

randy3k commented Jul 3, 2014

I see. Thx for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants