Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pander + R 3.4.0 → either a failure or an encoding issue #296

Closed
GegznaV opened this issue Apr 30, 2017 · 27 comments
Closed

pander + R 3.4.0 → either a failure or an encoding issue #296

GegznaV opened this issue Apr 30, 2017 · 27 comments

Comments

@GegznaV
Copy link
Contributor

GegznaV commented Apr 30, 2017

While using pander and R version 3.4.0 I faced either an error or an encoding issue:

  1. This code fails, while international strings are used:
Sys.setlocale(locale = "Lithuanian")
df <- iris[1:2,]
rownames(df) <- c("Pagal „a“ formulę", "Pagal „b“ formulę")
pander::pander(df)

Error message:

Error in table.expand(x, t.width, justify, sep.col) : basic_string::_S_create
  1. These lines are decoded incorrectly:
Sys.setlocale(locale = "Lithuanian")
df <- iris[1:2, 4:5]
rownames(df) <- c("ą ž", "š ė")
pander::pander(df)

Result:

---------------------------------
 &nbsp;    Petal.Width   Species 
--------- ------------- ---------
**ą ž**      0.2       setosa  

**Å Ä—**      0.2       setosa  
---------------------------------

The problem disappears when I switch back to R 3.3.3.
Is there a way to overcome this bug without switching back to previous versions of R?

 devtools::session_info()
Session info -------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, mingw32             
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  Lithuanian_Lithuania.1257   
 tz       Europe/Helsinki             
 date     2017-05-01                  

Packages -----------------------------------------------------------------------------------------------
 package  * version date       source        
 devtools   1.12.0  2016-12-05 CRAN (R 3.4.0)
 digest     0.6.12  2017-01-27 CRAN (R 3.4.0)
 memoise    1.1.0   2017-04-21 CRAN (R 3.4.0)
 pander   * 0.6.0   2015-11-23 CRAN (R 3.4.0)
 Rcpp       0.12.10 2017-03-19 CRAN (R 3.4.0)
 withr      1.0.2   2016-06-20 CRAN (R 3.4.0)
@GegznaV GegznaV changed the title pander + R 3.4.0 → either failure or encoding issues pander + R 3.4.0 → either a failure or an encoding issues Apr 30, 2017
@GegznaV GegznaV changed the title pander + R 3.4.0 → either a failure or an encoding issues pander + R 3.4.0 → either a failure or an encoding issue Apr 30, 2017
@daroczig
Copy link
Member

daroczig commented May 4, 2017

@RomanTsegelskyi any ideas regarding the Rcpp error message when using R 3.4.0?

@lselzer
Copy link

lselzer commented Jun 14, 2017

I can confirm this bug in Spanish locale, but I don't get an error, it's just wrongly encoded. The bug disappers in R 3.3.3

@philsf
Copy link

philsf commented Jul 3, 2017

For me it is the oposite: colnames and rownames work fine, but the data is incorrectly encoded in output.

This only happens in Windows, and it appears to happen both in R 3.4.0 and 3.4.1. It also does not happen if I switch back to 3.3.3.

> df <- cbind("á" = "á", "é" = "é", "ç" = "ç")
> rownames(df) <- "ã"
> pander::pander(df)

-------------------
&nbsp;   á   é   ç 
------- --- --- ---
 **ã**  á  é  ç 
-------------------

I have access to a linux box, and it does not happen in linux (which uses UTF-8).

I tried setting Encoding() and it still comes out wrong (albeit differently).

> Encoding(df)
[1] "latin1" "latin1" "latin1"
> Encoding(df) <- "UTF-8"
> Encoding(df)
[1] "UTF-8" "UTF-8" "UTF-8"
> pander::pander(df)

----------------------
&nbsp;   á    é    ç  
------- ---- ---- ----
 **ã**  <e1> <e9> <e7>
----------------------

Trying enc2native() makes no effect (work around issue #280 ).

Below the session info.

> devtools::session_info()
Session info -----------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, mingw32             
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  Portuguese_Brazil.1252      
 tz       America/Sao_Paulo           
 date     2017-07-03                  

Packages ---------------------------------------------------------------------------------
 package    * version date       source        
 backports    1.1.0   2017-05-22 CRAN (R 3.4.0)
 base       * 3.4.0   2017-04-21 local         
 compiler     3.4.0   2017-04-21 local         
 datasets   * 3.4.0   2017-04-21 local         
 devtools     1.13.2  2017-06-02 CRAN (R 3.4.1)
 digest       0.6.12  2017-01-27 CRAN (R 3.4.0)
 evaluate     0.10.1  2017-06-24 CRAN (R 3.4.0)
 graphics   * 3.4.0   2017-04-21 local         
 grDevices  * 3.4.0   2017-04-21 local         
 htmltools    0.3.6   2017-04-28 CRAN (R 3.4.0)
 knitr        1.16    2017-05-18 CRAN (R 3.4.0)
 magrittr     1.5     2014-11-22 CRAN (R 3.4.0)
 memoise      1.1.0   2017-04-21 CRAN (R 3.4.1)
 methods    * 3.4.0   2017-04-21 local         
 pander       0.6.0   2015-11-23 CRAN (R 3.4.0)
 Rcpp         0.12.11 2017-05-22 CRAN (R 3.4.0)
 rmarkdown    1.6     2017-06-15 CRAN (R 3.4.0)
 rprojroot    1.2     2017-01-16 CRAN (R 3.4.0)
 rstudioapi   0.6     2016-06-27 CRAN (R 3.4.1)
 stats      * 3.4.0   2017-04-21 local         
 stringi      1.1.5   2017-04-07 CRAN (R 3.4.0)
 stringr      1.2.0   2017-02-18 CRAN (R 3.4.0)
 tools        3.4.0   2017-04-21 local         
 utils      * 3.4.0   2017-04-21 local         
 withr        1.0.2   2016-06-20 CRAN (R 3.4.1)
 yaml         2.1.14  2016-11-12 CRAN (R 3.4.0)

@GegznaV
Copy link
Contributor Author

GegznaV commented Oct 31, 2017

I created a data.frame df in the Lithuanian locale. The object df:

df
#                   vidurkis PI_apatine_riba PI_virsutine_riba  n
# Pagal „z“ formulę     54.9            52.4              57.3 24
# Pagal „t“ formulę     54.9            52.3              57.5 24

And run pander(df):

library(pander)
debugonce(pandoc.table.return)

Sys.setlocale(locale = "Lithuanian")
df <- readRDS("df.Rds")
pander(df)

Before code breaking in lines 582-583, object t was created. I saved that object as
"t.Rds".

# lines 582-583 in `pandoc.table.return` where the error occurs:
res <- paste0(res, paste(apply(t, 1, function(x) paste0(table.expand(x, 
       t.width, justify, sep.col), sep.row)), collapse = "\n"))

Other code needed to run these lines:

# Function, defined inside `pander::pandoc.table.return`
table.expand <- function(cells, cols.width, justify, sep.cols) {
    .Call("pander_tableExpand_cpp", PACKAGE = "pander", 
          cells, cols.width, justify, sep.cols, style)
}

# Parameters before calling `table.expand`
t.width <- c(23, 10, 17, 19, 4)
justify <- c("centre", "centre", "centre", "centre", "centre")
sep.col <- c("",  " ", "" )
style   <- "multiline"

After leaving the debugging mode, I loaded the first row of "t.Rds" and created the analogous line as a character vector t0.

t <- readRDS("t.Rds")[1, ]
the_names <- names(t)

# The contents of `t0` are same contents as in `t`
t0 <- c("**Pagal „z“ formulę**", "54.9", "52.4", "57.3", "24")
names(t0) <- the_names

print(t0)
# t.rownames                vidurkis  PI_apatine_riba   PI_virsutine_riba  n 
# "**Pagal „z“ formulę**"   "54.9"    "52.4"             "57.3"            "24" 

print(t)
# t.rownames                vidurkis  PI_apatine_riba   PI_virsutine_riba  n 
# "**Pagal „z“ formulę**"   "54.9"    "52.4"            "57.3"             "24" 

sapply(t0, Encoding)
# t.rownames     vidurkis   PI_apatine_riba  PI_virsutine_riba  n 
# "unknown"      "unknown"  "unknown"        "unknown"          "unknown" 

sapply(t, Encoding)
# t.rownames      vidurkis   PI_apatine_riba  PI_virsutine_riba  n 
# "UTF-8"         "unknown"  "unknown"        "unknown"         "unknown" 

table.expand(t0, t.width, justify, sep.col)
# [1] " **Pagal „z“ formulę**     54.9          52.4               57.3          24 "

table.expand(t,  t.width, justify, sep.col)
## Error in table.expand(t, t.width, justify, sep.col) : 
##     basic_string::_S_create

table.expand wokrs fine with t0 and breaks with t. The only difference between these two objects is the encoding of the element t.rownames. Therefore it seems that "UTF-8" causes the error.

Any ideas why does the encoding change and cause this problem and how it's possible to fix it?

p.s. This comment could be the "more information" for #280

Objects t and df.zip

@daroczig
Copy link
Member

Thanks a lot for the detailed info, really helpful!

@RomanTsegelskyi, would you have a chance to look into this?

@RomanTsegelskyi
Copy link
Contributor

Sorry I missed all the notifications before, I will try to look into this

@GegznaV
Copy link
Contributor Author

GegznaV commented Feb 3, 2018

Is there any news on this issue?

@lselzer
Copy link

lselzer commented Feb 12, 2018

I've been exploring this issue and have found that the culprit is tableExpand_cpp

@daroczig
Copy link
Member

Thanks, @lselzer! @RomanTsegelskyi, any chance you might be able to look into this?

@lselzer
Copy link

lselzer commented Feb 12, 2018

using enc2native inside table.expand fixes these issue, though I don't know how robust is this solution. I only know so little about character encoding and I don't know how this will work with other languages like chinese.

I can make a PR if you are willing to accept it.

lselzer added a commit to lselzer/pander that referenced this issue Feb 13, 2018
@GegznaV
Copy link
Contributor Author

GegznaV commented Mar 12, 2018

@lselzer , on your computer, does it solve the original issue of this thread #296? I installed pander from your repository, but when I use the Lithuanian locale and the provided example, there is no effect on my PC (code results in the same error).

@lselzer
Copy link

lselzer commented Mar 12, 2018

Yes, it solves the issue. I tried your code, tried to reproduce your error but I couldn't

devtools::session_info()
Session info ------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, mingw32             
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  Lithuanian_Lithuania.1257   
 tz       America/Buenos_Aires        
 date     2018-03-12                  

Packages ----------------------------------------------------------------------------------------------------
 package   * version    date       source                              
 base      * 3.4.3      2017-11-30 local                               
 compiler    3.4.3      2017-11-30 local                               
 datasets  * 3.4.3      2017-11-30 local                               
 devtools    1.13.4     2017-11-09 CRAN (R 3.4.2)                      
 digest      0.6.15     2018-02-12 Github (eddelbuettel/digest@d9f40a9)
 graphics  * 3.4.3      2017-11-30 local                               
 grDevices * 3.4.3      2017-11-30 local                               
 memoise     1.1.0      2017-04-21 CRAN (R 3.4.0)                      
 methods   * 3.4.3      2017-11-30 local                               
 pander      0.6.1      2018-02-14 local                               
 Rcpp        0.12.15.1  2018-02-14 Github (RcppCore/Rcpp@15b3a87)      
 stats     * 3.4.3      2017-11-30 local                               
 tools       3.4.3      2017-11-30 local                               
 utils     * 3.4.3      2017-11-30 local                               
 withr       2.1.1.9000 2017-12-22 Github (jimhester/withr@df18523)    
 yaml        2.1.14     2016-11-12 CRAN (R 3.4.0)           
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Lithuanian_Lithuania.1257  LC_CTYPE=Lithuanian_Lithuania.1257   
[3] LC_MONETARY=Lithuanian_Lithuania.1257 LC_NUMERIC=C                         
[5] LC_TIME=Lithuanian_Lithuania.1257    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.3   tools_3.4.3      withr_2.1.1.9000 rstudioapi_0.7   yaml_2.1.14      memoise_1.1.0   
 [7] Rcpp_0.12.15.1   pander_0.6.1     digest_0.6.15    devtools_1.13.4 

@hr70
Copy link

hr70 commented Apr 24, 2018

Similar problems on a German Locale since switching from R3.3 to R3.4 (Windows). I just tried with R3.5, but that didn’t change anything. Seems as if things get encoded wrongly in the rownames if German characters (e.g. “Ä”) are present there, in the colnames if present there, and interestingly only in the rownames if present in rownames and colnames.
I downloaded https://github.com/lselzer/pander/archive/06c2f6579740564063af7081373113daa62b1023.zip and tried to install it, but unfortunately couldn’t get it to work, so don’t know if this would change things.
It would be great if a solution to this problem could be found.
Here’s an example:

library(pander)
x <- data.frame(hö = c("ä", "o", "ü"))
row.names(x) <- c("A", "Ä", "C")
x

A ä
Ä o
C ü
pander(x)


  hö


A ä

Ä o

C ü

@awfrankwils
Copy link

Hi there,

I am also experiencing encoding issues on Windows with R 3.5.1 and Pander 0.6.2.

I have been trying to insert unicode for no-break spaces to indent factor levels in my tables. Here are three examples of what I am trying to do. Example 1 uses a normal space that is ignored by Pander(); examples 2 and 3 use the unicode "\u00A0" which appears as  instead of a space.

#using a space (ignored by pander)
example<-rbind("Meals in a Typical Day", " 1", " 2", " 3", " 4 or more")
example<-cbind(example, counts=c("","5","10","25","20"))
example

example1
pander(example)
panderexample1

#using unicode for no-break space 
example2<-rbind("Meals in a Typical Day", "\u00A01", "\u00A02", "\u00A03", "\u00A04 or more")
example2<-cbind(example2, counts=c("","5","10","25","20"))
example2

example2
pander(example2)
panderexample2

#using unicode for no-break space 
example3<-rbind("Meals in a Typical Day", "\u00A0\u00A01", "\u00A0\u00A02", "\u00A0\u00A03", "\u00A0\u00A04 or more")
example3<-cbind(example3, counts=c("","5","10","25","20"))
example3

example3
pander(example3)
panderexample3

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] bindrcpp_0.2.2 lubridate_1.7.4 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5
[7] readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1 VIM_4.7.0
[13] data.table_1.11.4 colorspace_1.3-2 pander_0.6.2 xtable_1.8-2 knitr_1.20 descr_1.1.4

loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 xml2_1.2.0 bindr_0.1.1 magrittr_1.5 MASS_7.3-50 hms_0.4.2
[7] rvest_0.3.2 tidyselect_0.2.4 lattice_0.20-35 R6_2.2.2 rlang_0.2.1 broom_0.5.0
[13] laeken_0.4.6 rio_0.5.10 e1071_1.6-8 withr_2.1.2 modelr_0.1.2 class_7.3-14
[19] lmtest_0.9-36 assertthat_0.2.0 abind_1.4-5 digest_0.6.15 curl_3.2 haven_1.1.2
[25] sp_1.3-1 compiler_3.5.1 DEoptimR_1.0-8 cellranger_1.1.0 pillar_1.3.0 scales_0.5.0
[31] backports_1.1.2 boot_1.3-20 jsonlite_1.5 pkgconfig_2.0.1 rstudioapi_0.7 munsell_0.5.0
[37] carData_3.0-1 httr_1.3.1 plyr_1.8.4 car_3.0-0 tools_3.5.1 nnet_7.3-12
[43] vcd_1.4-4 nlme_3.1-137 gtable_0.2.0 cli_1.0.0 readxl_1.1.0 yaml_2.2.0
[49] lazyeval_0.2.1 crayon_1.3.4 zip_1.0.0 glue_1.3.0 robustbase_0.93-1.1 openxlsx_4.1.0
[55] stringi_1.1.7 foreign_0.8-70 zoo_1.8-3

@daroczig
Copy link
Member

daroczig commented Sep 9, 2018

I tested #326 in a Windows VM started and seems to do the trick, but please confirm.

@hr70
Copy link

hr70 commented Sep 17, 2018

Thanks; I downloaded and installed "pander-table-expand-fallback.zip" today. Unfortunately, for me the result is the same as before.

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252
[3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] pander_0.6.2

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 Rcpp_0.12.17 digest_0.6.15

@liegepr
Copy link

liegepr commented Oct 26, 2018

Hello,
I have been successful with the first commit intended to solve this issue:
install_github("Rapporter/pander@06c2f65")
but not with the latest one:
install_github("Rapporter/pander@6649299")

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

@hr70
Copy link

hr70 commented Nov 12, 2018

I tried on my home computer, and can also confirm success with this:
install_github("Rapporter/pander@06c2f65")
(not able to test it in the office, as I am not allowed to install packages from github there)

@dcomtois
Copy link
Contributor

dcomtois commented Dec 18, 2018

I had an issue trying to print a data frame with cyrillic column names:

Error in table.expand(t.colnames, t.width, justify, sep.col) : 
  basic_string::_S_create

Installing the patch mentionned in the above comment resolved the issue. (Whereas using colnames(x) <- enc2native(colnames(x)) before the call to pander() didn't help).

@GegznaV
Copy link
Contributor Author

GegznaV commented Dec 20, 2018

I can also confirm that install_github("Rapporter/pander@06c2f65") solved my original issue (I use Windows 10 and R 3.5.2).

@daroczig, will this patch be merged into the main branch of pander? When can one expect it on CRAN?

And maybe #326 is not necessary?

dcomtois added a commit to dcomtois/pander that referenced this issue Dec 26, 2018
dcomtois added a commit to dcomtois/pander that referenced this issue Dec 26, 2018
dcomtois added a commit to dcomtois/pander that referenced this issue Dec 26, 2018
@mgruebsch
Copy link

I had the same issue with German which is solved by devtools::install_github("Rapporter/pander@06c2f65"). Please merge the fix into the master release. Thank you!

@dcomtois
Copy link
Contributor

@daroczig Do you plan on merging this issue? If not, pls let me know... I am holding off pushing an update of summarytools to CRAN (which will include translations) until the issue is resolved. Thx!

@daroczig
Copy link
Member

Sorry for the delay, getting this done today.

@GegznaV
Copy link
Contributor Author

GegznaV commented Jul 24, 2019

@daroczig It seems that currently CRAN version of pander is inferior to the GitHub version.
When is the GitHub version of pander (with this encoding bug fixed) going to be released on CRAN?

@GegznaV
Copy link
Contributor Author

GegznaV commented Jan 11, 2020

Is pander going to be updated on CRAN?

@valentinaandrade
Copy link

I had an issue trying to print a data frame with cyrillic column names:

Error in table.expand(t.colnames, t.width, justify, sep.col) : 
  basic_string::_S_create

Installing the patch mentionned in the above comment resolved the issue. (Whereas using colnames(x) <- enc2native(colnames(x)) before the call to pander() didn't help).

I've similar issue but the comment didn't resolved the new issue

Error in table.expand(x, t.width, justify, sep.col) : 
  basic_string::_M_create

@daroczig
Copy link
Member

pander has been updated on CRAN on 2021-06-13, so the CRAN version should include this fix. If you see any similar problems, please open a new ticket with a minimal reproducible example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.