Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RStudio viewer encoding issue #115

Closed
artemklevtsov opened this issue Jan 30, 2018 · 11 comments
Closed

RStudio viewer encoding issue #115

artemklevtsov opened this issue Jan 30, 2018 · 11 comments

Comments

@artemklevtsov
Copy link

Hi.

Thank for this package. I faced with encoding issue when used with RStudio viewer.

diffobj::diffPrint("Hello", "He1lo", format = "html")

default

diffobj::diffPrint("Привет", "Превед", format = "html")

default


R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS: /usr/lib/libblas.so.3.8.0
LAPACK: /usr/lib/liblapack.so.3.8.0

locale:
[1] LC_CTYPE=ru_RU.UTF-8 LC_NUMERIC=C LC_TIME=ru_RU.UTF-8
[4] LC_COLLATE=C LC_MONETARY=ru_RU.UTF-8 LC_MESSAGES=ru_RU.UTF-8
[7] LC_PAPER=ru_RU.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3 parallel_3.4.3 rstudioapi_0.7 yaml_2.1.16 crayon_1.3.4 diffobj_0.1.9

@brodieG
Copy link
Owner

brodieG commented Jan 30, 2018

Thanks for reporting. I may not be able to look at this in the immediate future, but I will look at it.

In the meantime, some questions:

What happens if you just paste your string directly into the promp? I get the following in a C locale:

> "Привет"
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> xx <- "Привет"
> xx
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> Encoding(xx)
[1] "unknown"
> Encoding(xx) <- 'UTF-8'
> xx
[1] "<U+041F><U+0440><U+0438><U+0432><U+0435><U+0442>"

Looks like U+041F is the capital P in russian, so the string does seem encoded in UTF-8 yet is not displayed properly by RStudio.

Maybe this is a related issue.

Can you try running this through a normal terminal and see what you get there (both in non-html mode, as well as html mode)?

@brodieG brodieG added the bug label Jan 30, 2018
@brodieG brodieG added this to the 0.2 milestone Jan 30, 2018
@artemklevtsov
Copy link
Author

artemklevtsov commented Jan 30, 2018

Terminal output is correct.

> diffobj::diffPrint("Hello", "He1lo", format = "raw")
< "Hello"      > "He1lo"    
@@ 1 @@        @@ 1 @@      
< [1] "Hello"  > [1] "He1lo"
> diffobj::diffPrint("Привет", "Превед", format = "raw")
< "Привет"      > "Превед"    
@@ 1 @@         @@ 1 @@       
< [1] "Привет"  > [1] "Превед"

Also work with the ansi formats.

I think the SO question related with Windows only issues.

Note RStudio viewer works correct with the Rmarkdown reports (on Russian).

@artemklevtsov
Copy link
Author

htmltools::html_print also works.

htmltools::html_print("Привет")

default

@brodieG
Copy link
Owner

brodieG commented Jan 31, 2018

Thanks, that's useful.

@brodieG
Copy link
Owner

brodieG commented Jan 31, 2018

Actually, one thing you haven't shown, what does the Rstudio console do if you just hit enter after copy-pasting the string there as in my example where I got the "\370..." business?

@artemklevtsov
Copy link
Author

Let's try with docker. The rocker/r-ver:3.4.3 image for example.

@brodieG
Copy link
Owner

brodieG commented Feb 1, 2018

I don't understand. You are able to produce the error in rocker/r-ver:3.4.3? All I was looking for was for you to paste the string in your Rstudio console in quotes and hit enter, and report back whether the string as interpreted by the Rstudio console looks normal to you or not. For example, this is what happens to me:

> "Привет"
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> xx <- "Привет"
> xx
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> Encoding(xx)
[1] "unknown"
> Encoding(xx) <- 'UTF-8'
> xx
[1] "<U+041F><U+0440><U+0438><U+0432><U+0435><U+0442>"

Notice this doesn't involve diffobj at all.

@artemklevtsov
Copy link
Author

I think your input is not UTF-8 encoded.

> cat("\320\237\321\200\320\270\320\262\320\265\321\202")
Привет
> xx <- "Привет"
> Encoding(xx)
[1] "UTF-8"
> xx
[1] "Привет"
> cat("\320\237\321\200\320\270\320\262\320\265\321\202")
Привет

@brodieG
Copy link
Owner

brodieG commented Feb 2, 2018

It is properly encoded: "<U+041F>" is capital russian P, "<U+0440>" is lower case r, and so on. For whatever reason my Rstudio doesn't want to render them as the characters they are. Probably a locale issue on my side. Thanks for the additional info. I think this will be sufficient to figure out what's going on when I dig into it.

@brodieG brodieG modified the milestones: 0.2, 0.1.11 Jul 27, 2018
@brodieG
Copy link
Owner

brodieG commented Jul 27, 2018

I believe this is now fixed in the development branch:

Could you give it a whirl and see if it fixes your problem on your setup:

devtools::install_github('brodieg/diffobj@e824b481c94aac20d309dd31c0cb4ca7b17452ba')

@artemklevtsov
Copy link
Author

Now it looks good.

brodieG added a commit that referenced this issue Jul 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants